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PREFACE 



These notes build upon a course I taught at the University of Maryland during 
the fall of 1983. My great thanks go to Martino Bardi, who took careful notes, 
saved them all these years and recently mailed them to me. Faye Yeager typed up 
his notes into a first draft of these lectures as they now appear. Scott Armstrong 
read over the notes and suggested many improvements: thanks, Scott. Stephen 
Moye of the American Math Society helped me a lot with AMSTeX versus LaTeX 
issues. My thanks also to Atilla Yilmaz for spotting lots of typos and errors, which 
I have corrected. 

I have radically modified much of the notation (to be consistent with my other 
writings), updated the references, added several new examples, and provided a proof 
of the Pontryagin Maximum Principle. As this is a course for undergraduates, I have 
dispensed in certain proofs with various measurability and continuity issues, and as 
compensation have added various critiques as to the lack of total rigor. 

This current version of the notes is not yet complete, but meets I think the 
usual high standards for material posted on the internet. Please email me at 
evans@math.berkeley.edu with any corrections or comments. 
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CHAPTER 1: INTRODUCTION 



1.1. The basic problem 

1.2. Some examples 

1.3. A geometric solution 

1.4. Overview 



1.1 THE BASIC PROBLEM. 

DYNAMICS. We open our discussion by considering an ordinary differential 
equation (ODE) having the form 



We are here given the initial point x° G lR n and the function f : IR n — > M n . The un- 
known is the curve x : [0, oo) — > R n , which we interpret as the dynamical evolution 
of the state of some "system" . 

CONTROLLED DYNAMICS. We generalize a bit and suppose now that 
f depends also upon some "control" parameters belonging to a set A C M m ; so that 
f : l"xA^R". Then if we select some value a & A and consider the corresponding 
dynamics: 



we obtain the evolution of our system when the parameter is constantly set to the 
value a. 

The next possibility is that we change the value of the parameter as the system 
evolves. For instance, suppose we define the function a : [0, oo) — > A this way: 



for times < t\ < t 2 < ts . . . and parameter values a±, a 3 , • • • G A; and we then 
solve the dynamical equation 



(1.1) 



x(f) = f(x(f)) 
x(0) = x°. 




±(t) = f(x(*),a) 
x(0) = x°, 



(f >0) 




(1.2) 





The picture illustrates the resulting evolution. The point is that the system may 
behave quite differently as we change the control parameters. 
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x° 



Controlled dynamics 



More generally, we call a function ot : [0, oo) ->ia control. Corresponding to 
each control, we consider the ODE 



(ODE) 



x(t) = f(x(t),a(t)) (£>0) 
x(0) = x°, 



and regard the trajectory x(-) as the corresponding response of the system. 
NOTATION, (i) We will write 

f(x, a) = '■ 

\f n (x,a)J 

to display the components of f , and similarly put 

fx 1 (t)\ 



x(t) 



We will therefore write vectors as columns in these notes and use boldface for 
vector-valued functions, the components of which have superscripts, 
(ii) We also introduce 



A = {ct : [0, oo) — > A | ct(-) measurable} 




Note very carefully that our solution x(-) of (ODE) depends upon a(-) and the initial 
condition. Consequently our notation would be more precise, but more complicated, 
if we were to write 



displaying the dependence of the response x(-) upon the control and the initial 



PAYOFFS. Our overall task will be to determine what is the "best" control for 
our system. For this we need to specify a specific payoff (or reward) criterion. Let 
us define the payoff functional 



where x(-) solves (ODE) for the control <*(•). Here r : R n x A — > R and g : R n — > R 
are given, and we call r the running payoff and g the terminal payoff. The terminal 
time T > is given as well. 

THE BASIC PROBLEM. Our aim is to find a control a*(-), which maximizes 
the payoff. In other words, we want 



for all controls a(-) G A. Such a control «*(•) is called optimal. 

This task presents us with these mathematical issues: 

(i) Does an optimal control exist? 

(ii) How can we characterize an optimal control mathematically? 

(iii) How can we construct an optimal control? 

These turn out to be sometimes subtle problems, as the following collection of 
examples illustrates. 

1.2 EXAMPLES 

EXAMPLE 1: CONTROL OF PRODUCTION AND CONSUMPTION. 

Suppose we own, say, a factory whose output we can control. Let us begin to 
construct a mathematical model by setting 

x(t) = amount of output produced at time t > 0. 
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x(.)=x(.,a(-),A 



value. 



□ 



(P) 




P[a*(-)] > P[a(-)] 



We suppose that we consume some fraction of our output at each time, and likewise 
can reinvest the remaining fraction. Let us denote 

a(t) = fraction of output reinvested at time t > 0. 

This will be our control, and is subject to the obvious constraint that 

< a(t) < 1 for each time t > 0. 

Given such a control, the corresponding dynamics are provided by the ODE 

x(t) = ka(t)x(t) 
x(0) = x°. 

the constant k > modelling the growth rate of our reinvestment. Let us take as a 
payoff functional 



P[a(.)\ 



[1 - a(t))x(t) dt. 



The meaning is that we want to maximize our total consumption of the output, our 
consumption at a given time t being (1 — a(t))x(t). This model fits into our general 
framework for n = m = 1, once we put 

A = [0, 1], f(x, a) = kax, r(x, a) = (1 — a)x, g = 0. 

a* = 1 



a* = 







T 



A BANG-BANG CONTROL 



a*(t) 



As we will see later in §4.4.2, an optimal control «*(■) is given by 

1 if < t < t* 
if t* < t < T 

for an appropriate switching time < t* < T. In other words, we should reinvest 
all the output (and therefore consume nothing) up until time t* , and afterwards, we 
should consume everything (and therefore reinvest nothing). The switchover time 
t* will have to be determined. We call ct*(-) a bang-bang control. □ 



EXAMPLE 2: REPRODUCTIVE STATEGIES IN SOCIAL INSECTS 
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The next example is from Chapter 2 of the book Caste and Ecology in Social 
Insects, by G. Oster and E. O. Wilson [O-W]. We attempt to model how social 
insects, say a population of bees, determine the makeup of their society. 

Let us write T for the length of the season, and introduce the variables 

w(t) = number of workers at time t 
q(t) = number of queens 

a(t) = fraction of colony effort devoted to increasing work force 
The control a is constrained by our requiring that 

< a(t) < 1. 

We continue to model by introducing dynamics for the numbers of workers and 
the number of queens. The worker population evolves according to 

/ w(t) = -fiw(t) + bs(t)a(t)w(t) 
\ w (o) = w °. 

Here \x is a given constant (a death rate), b is another constant, and s(t) is the 
known rate at which each worker contributes to the bee economy. 

We suppose also that the population of queens changes according to 

f q(t) = -uq(t) + c(l - a(t))s(t)w(t) 
{ q(0) = q°, 

for constants v and c. 

Our goal, or rather the bees', is to maximize the number of queens at time T: 

P[a(.)] = q(T). 

So in terms of our general notation, we have x(£) = (w(t), q(t)) T and x° = (w°, q°) T . 
We are taking the running payoff to be r = 0, and the terminal payoff g(w, q) = q. 

The answer will again turn out to be a bang-bang control, as we will explain 
later. □ 

EXAMPLE 3: A PENDULUM. 

We look next at a hanging pendulum, for which 

9 {t) = angle at time t. 
If there is no external force, then we have the equation of motion 

r e\t) + \9(t) + u 2 e(t) = o 

\ 0(0) =0i, 0(0) =0 2 ; 
the solution of which is a damped oscillation, provided A > 0. 
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Now let a(-) denote an applied torque, subject to the physical constraint that 

\a\ < 1. 

Our dynamics now become 

f 9{t) + X9(t)+u 2 9(t) = a(t) 

\e(o) = e 1 , e(o) = e 2 . 

Define x\(t) = 9(t), X2(t) = 0(t), and x(£) = (xi(t),X2(t)). Then we can write the 
evolution as the system 

*® = (* 1 ) = (t) = ( , x l J = 

\x 2 J \9J \-Xx 2 -uj 2 x 1 + a(t)J 
We introduce as well 

P[a(-)\ = - [ 1 dt = -r, 

for 

r = r(a(-)) = first time that x(r) = (that is, 9{t) = 9{t) = 0.) 
We want to maximize P[-], meaning that we want to minimize the time it takes to 
bring the pendulum to rest. 

Observe that this problem does not quite fall within the general framework 
described in §1.1, since the terminal time is not fixed, but rather depends upon the 
control. This is called a fixed endpoint, free time problem. □ 

EXAMPLE 4: A MOON LANDER 

This model asks us to bring a spacecraft to a soft landing on the lunar surface, 
using the least amount of fuel. 
We introduce the notation 

h(t) = height at time t 
v(t) = velocity = h(t) 

m{t) = mass of spacecraft (changing as fuel is burned) 
a(t) = thrust at time t 

We assume that 

< a(t) < 1, 

and Newton's law tells us that 

mh = —gm + a, 

the right hand side being the difference of the gravitational force and the thrust of 
the rocket. This system is modeled by the ODE 

m = -g+^ 

h(t) = v(t) 
m(t) = —ka(t). 
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height = h(t) 



moon's surface 



A SPACECRAFT LANDING ON THE MOON 

We summarize these equations in the form 

x(t) = f(x(t),a(*)) 

for x(i) = (v(t),h(t),m(t)). 

We want to minimize the amount of fuel used up, that is, to maximize the 
amount remaining once we have landed. Thus 

P[a(-)] = m(r), 

where 

r denotes the first time that h(r) = v(r) = 0. 

This is a variable endpoint problem, since the final time is not given in advance. 
We have also the extra constraints 

h(t) > 0, m{t) > 0. 

□ 

EXAMPLE 5: ROCKET RAILROAD CAR. 

Imagine a railroad car powered by rocket engines on each side. We introduce 
the variables 

q(t) = position at time t 

v(t) = q(t) = velocity at time t 

a(t) = thrust from rockets, 

where 

-1 < a(t) < 1, 
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A ROCKET CAR ON A TRAIN TRACK 



the sign depending upon which engine is firing. 

We want to figure out how to fire the rockets, so as to arrive at the origin 
with zero velocity in a minimum amount of time. Assuming the car has mass m, 
the law of motion is 

mq(t) = a(t). 



We rewrite by setting x(t) = (q(t), v(t)) T . Then 




Jo 

for 

t = first time that q{r) = v(r) = 0. 

1.3 A GEOMETRIC SOLUTION. 

To illustrate how actually to solve a control problem, in this last section we 
introduce some ad hoc calculus and geometry methods for the rocket car problem, 
Example 5 above. 

First of all, let us guess that to find an optimal solution we will need only to 
consider the cases a = 1 or a = — 1. In other words, we will focus our attention only 
upon those controls for which at each moment of time either the left or the right 
rocket engine is fired at full power. (We will later see in Chapter 2 some theoretical 
justification for looking only at such controls.) 

CASE 1: Suppose first that a = 1 for some time interval, during which 

( q = v 
\v = 1. 

Then 

vv = <j, 
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and so 



Let to belong to the time interval where a = 1 and integrate from to to t: 

v 2 {t) v 2 (t ) 



Consequently 
(1.1) 



q(t)-q(t ) 



v 2 (t) = 2q(t) + (v\t ) - 2q(t Q )) 

V » ' 

b 



In other words, so long as the control is set for a = 1, the trajectory stays on the 
curve v 2 = 2q + b for some constant b. 
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CASE 2: Suppose now a = — 1 on some time interval. Then as above 

q = v 



and hence 



-1, 



\{v 2 )=~q. 



Let t\ belong to an interval where a, = — 1 and integrate: 
(1.2) v 2 (t) = -2q(t) + (2q(t 1 ) - v 2 (h)). 

V v ' 

C 

Consequently, as long as the control is set for a = —1, the trajectory stays on the 
curve v 2 = —2q + c for some constant c. 
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GEOMETRIC INTERPRETATION. Formula (1.1) says if a = 1, then (q(t), v(t)) 
lies on a parabola of the form 

v 2 = 2q + b. 

Similarly, (1.2) says if a = —1, then (q(t),v(t)) lies on a parabola 

v 2 = -2q + c. 

Now we can design an optimal control <%*(■), which causes the trajectory to jump 
between the families of right- and left-pointing parabolas, as drawn. Say we start 
at the black dot, and wish to steer to the origin. This we accomplish by first setting 
the control to the value a = — 1, causing us to move down along the second family of 
parabolas. We then switch to the control a = 1, and thereupon move to a parabola 
from the first family, along which we move up and to the left, ending up at the 
origin. See the picture. 

1.4 OVERVIEW. 

Here are the topics we will cover in this course: 

• Chapter 2: Controllability, bang-bang principle. 

In this chapter, we introduce the simplest class of dynamics, those linear in both 
the state x(-) and the control oc(-), and derive algebraic conditions ensuring that 
the system can be steered into a given terminal state. We introduce as well some 
abstract theorems from functional analysis and employ them to prove the existence 
of so-called "bang-bang" optimal controls. 

• Chapter 3: Time-optimal control. 
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HOW TO GET TO THE ORIGIN IN MINIMAL TIME 



In Chapter 3 we continue to study linear control problems, and turn our atten- 
tion to finding optimal controls that steer our system into a given state as quickly 
as possible. We introduce a maximization principle useful for characterizing an 
optimal control, and will later recognize this as a first instance of the Pontryagin 
Maximum Principle. 

• Chapter 4: Pontryagin Maximum Principle. 

Chapter 4's discussion of the Pontryagin Maximum Principle and its variants 
is at the heart of these notes. We postpone proof of this important insight to the 
Appendix, preferring instead to illustrate its usefulness with many examples with 
nonlinear dynamics. 

• Chapter 5: Dynamic programming. 

Dynamic programming provides an alternative approach to designing optimal 
controls, assuming we can solve a nonlinear partial differential equation, called 
the Hamilton- Jacobi-Bellman equation. This chapter explains the basic theory, 
works out some examples, and discusses connections with the Pontryagin Maximum 
Principle. 

• Chapter 6: Game theory. 

We discuss briefly two-person, zero-sum differential games and how dynamic 
programming and maximum principle methods apply. 

• Chapter 7: Introduction to stochastic control theory. 

This chapter provides a very brief introduction to the control of stochastic dif- 
ferential equations by dynamic programming techniques. The Ito stochastic calculus 
tells us how the random effects modify the corresponding Hamilton-Jacobi-Bellman 
equation. 
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• Appendix: Proof of the Pontryagin Maximum Principle. 

We provide here the proof of this important assertion, discussing clearly the 
key ideas. 
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CHAPTER 2: CONTROLLABILITY, BANG-BANG PRINCIPLE 



2.1 Definitions 

2.2 Quick review of linear ODE 

2.3 Controllability of linear equations 

2.4 Observability 

2.5 Bang-bang principle 

2.6 References 

2.1 DEFINITIONS. 

We firstly recall from Chapter 1 the basic form of our controlled ODE: 



(ODE) 



x(*) = f(x(*),a(*)) 
x(0) = x°. 



Here x° G M n , f : R n x A -> M n , a : [0, oo) -> A is the control, and x : [0, oo) -> M n 
is the response of the system. 

This chapter addresses the following basic 

CONTROLLABILITY QUESTION: Given the initial point x° and a "target" 
set ScR", does there exist a control steering the system to S in finite time? 

For the time being we will therefore not introduce any payoff criterion that 
would characterize an "optimal" control, but instead will focus on the question as 
to whether or not there exist controls that steer the system to a given goal. In 
this chapter we will mostly consider the problem of driving the system to the origin 
S = {0}. 

DEFINITION. We define the reachable set for time t to be 

C(t) = set of initial points x° for which there exists a 
control such that x(t) = 0, 

and the overall reachable set 

C = set of initial points x° for which there exists a 
control such that x(£) = for some finite time t. 

Note that 

e=\Jc(t). 

t>0 

Hereafter, let M nXm denote the set of all n x m matrices. We assume for the 
rest of this and the next chapter that our ODE is linear in both the state x(-) and 
the control <*(•), and consequently has the form 

±(t) = Mx(t) + Na(t) (t > 0) 
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where M e M nXn and N e M nXm . We assume the set A of control parameters is 
a cube in R m : 

A = [-l,l] m = {ael m | |ai| < 1, i= l,...,m}. 



2.2 QUICK REVIEW OF LINEAR ODE. 

This section records for later reference some basic facts about linear systems of 
ordinary differential equations. 

DEFINITION. Let X(-) : R -> M nXn be the unique solution of the matrix 
ODE 

' X(£) = MX{t) (t e R) 
X(0) = I. 

We call X(-) a fundamental solution, and sometimes write 

t k M f 



X(f) = e tM := J2 



k\ ' 

fe=0 

the last formula being the definition of the exponential e tM . Observe that 

X" 1 (t)=X(-t). 

THEOREM 2.1 (SOLVING LINEAR SYSTEMS OF ODE). 

(i) The unique solution of the homogeneous system of ODE 

r ±(t) = Mx(t) 

\ x(0) = 

x(t) = X(t)x° = e tM x°. 

(ii) T/ie unique solution of the nonhomogeneous system 



x(t) = Mx(t) + f(t) 
x(0) = x°. 



x(t) = X(t)x° + X(t) / X" 1 (s)f(s)rfs. 



This expression is the variation of parameters formula. 



2.3 CONTROLLABILITY OF LINEAR EQUATIONS. 
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According to the variation of parameters formula, the solution of (ODE) for a 
given control a(-) is 

x(t) = X(t)x° + X(t) [ X- 1 (s)Na(s)ds, 

Jo 

where X(£) = e tM . Furthermore, observe that 

x° e C(t) 

if and only if 

(2.1) there exists a control a(-) G A such that x(i) = 
if and only if 

(2.2) = X(t)x° + X(t) I X- 1 {s)Not{s) ds for some control a(-) G A 

Jo 

if and only if 

(2.3) x° = - I X~ 1 (s)Na(s) ds for some control a(-) G A. 

Jo 

We make use of these formulas to study the reachable set: 

THEOREM 2.2 (STRUCTURE OF REACHABLE SET). 

(i) The reachable set C is symmetric and convex. 

(ii) Also, ifx° G C(t), then x° G C(t) for all times t>t. 

DEFINITIONS. 

(i) We say a set S is symmetric if x G S implies — x G S. 

(ii) The set S is convex if x, x G S and < A < 1 imply Xx + (1 — X)x G S. 

Proof. 1. (Symmetry) Let t > and x° G C(t). Thenx = - f* X~ 1 (s)Ncx(s) ds 
for some admissible control a G A. Therefore —x° = — X~ 1 (s)N(— ct(s)) ds, and 
— ct G A since the set A is symmetric. Therefore —a; G C(t), and so each set C(t) 
symmetric. It follows that C is symmetric. 

2. (Convexity) Take x°,x° G C; so that x° G C(t), x° G C(£) for appropriate 
times t,i > 0. Assume t < i. Then 



x 



= - f^X- 1 (s)Na(s)ds for some control aei. 



-0 d 
X = 

Define a new control 

d(s) 



r" = — J* X 1 (s)A^o:(s) (is for some control dG A 

a(s) if < s < t 

if s > t. 
17 



Then 

x° = - [ X- 1 (s)Ndt(s)ds, 
Jo 

and hence x° G C{t). Now let < A < 1, and observe 

Xx° + (1 - X)x° = - [ X- 1 (s)N(Xa(s) + (1 - A)d(s)) ds. 

Therefore Ax° + (1 - A)£° G C(t) C C. 

3. Assertion (ii) follows from the foregoing if we take t = t. □ 



A SIMPLE EXAMPLE. Let n = 2 and m = 1, A = [-1, 1], and write x(t) = 
(x 1 (t),x 2 (t)) T . Suppose 

x 1 = 

i: 2 = 

This is a system of the form x = Mx + Na, for 

Clearly C = {(xi,X2) | = 0}, the ^2-axis. □ 

We next wish to establish some general algebraic conditions ensuring that C contains 
a neighborhood of the origin. 

DEFINITION. The controllability matrix is 

G = G(M 7 N) := [N, MAT, M 2 N, . . . , M n ~ l N]. 

V v ' 

nx(mn) matrix 

THEOREM 2.3 (CONTROLLABILITY MATRIX). We have 

rankG = n 

if and only if 

G C°. 

NOTATION. We write C° for the interior of the set C. Remember that 

rank of G = number of linearly independent rows of G 

= number of linearly independent columns of G. 

Clearly rankG < n. □ 
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Proof. 1. Suppose first that rankG < n. This means that the linear span of 
the columns of G has dimension less than or equal to n — 1. Thus there exists a 
vector b G M n , b ^ 0, orthogonal to each column of G. This implies 

b T G = 

and so 

b T N = b T MN = ■■■ = b T M n ~ 1 N = 0. 

2. We claim next that in fact 

(2.4) b T M k N = for all positive integers k. 

To confirm this, recall that 

p(X) := det(A/ - M) 
is the characteristic polynomial of M. The Cayley-Hamilton Theorem states that 

p(M) = 0. 

So if we write 

p(A) = A n + /^A™" 1 + • • • + AA 1 + A>, 

then 

p(M) = M n + $ n - X M n - x + ■■■ + fiiM + p I = 0. 

Therefore 

M n = -/J n _ 1 M n - 1 - /3 n _ 2 ikT- 2 j3 x M - p I, 

and so 

b T M n N = b T (-P n - 1 M n - 1 - . . . )N = 0. 
Similarly, b T M n+1 N = 6 T (-/3 n _iM n — . . . )AT = 0, etc. The claim (2.4) is proved. 
Now notice that 

b T X-\s)N = b T e~ sM N = b T Y ( ~ g) " MfciV = Y tlll b T M k N = 

fc=0 fc=0 

according to (2.4). 

3. Assume next that x° G C(t). This is equivalent to having 

x° = — X.~ 1 (s)Nct(s) ds for some control a(-) G A. 
Jo 

Then 

b-x° = -f b T X- 1 (s)Na(s) ds = 0. 
Jo 

This says that b is orthogonal x . In other words, C must lie in the hyperplane 
orthogonal to 6 7^ 0. Consequently C° = 0. 
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4. Conversely, assume £ C° . Thus £ C°(t) for all t > 0. Since C(t) is 
convex, there exists a supporting hyperplane to C(t) through 0. This means that 
there exists b ^ such that b ■ x° < for all x° £ C(t). 

Choose any x° G C(t). Then 



x° = - [ X- 1 (s)Nct(s)ds 
Jo 



for some control en, and therefore 

ft 



> b ■ x° = - [ b T X- 1 (s)Na(s) ds. 
Jo 



Thus 



/ b T X- 1 (s)Na(s) ds>0 for all controls a(-). 
Jo 

We assert that therefore 

(2.5) b T X- 1 (s)N = 0, 

a proof of which follows as a lemma below. We rewrite (2.5) as 

(2.6) b T e~ sM N = 0. 

Let s = to see that b T N = 0. Next differentiate (2.6) with respect to s, to find 
that 

b T (-M)e- sM N = 0. 

For s = this says 

b T MN = 0. 
We repeatedly differentiate, to deduce 

b T M k N = for all k = 0,1,..., 

and so b T G = 0. This implies rankG < n, since b ^ 0. □ 

LEMMA 2.4 (INTEGRAL INEQUALITIES). ,Wme i/mi 

(2.7) / b T X- 1 (s)Na(s)ds > 
/or aZZ a(-) e A. Then 

b T X- 1 (s)N = 0. 
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Proof. Replacing a. by —a in (2.7), we see that 

/ b T X- 1 (s)Na(s)ds = 
Jo 

for all a(-) G A. Define 

v(s) :=6 T X- 1 (s)A^. 

If v ^ 0, then v(s ) 7^ for some s . Then there exists an interval / such that 
s E I and v ^ on /. Now define a(-) G ^4 this way: 

a(s) = {si I) 

where \v\ (Ya=i \ v i\ 2 ) 2 ■ Then 

= [ v(s) ■ a(s) ds= [ ^fi • ds = ^= ( |v(s)| cfe 

7o J 1 y/n |v(s)| v^J/ 

This implies the contradiction that v = in /. □ 



DEFINITION. We say the linear system (ODE) is controllable if C = M n . 



THEOREM 2.5 (CRITERION FOR CONTROLLABILITY). Let A be 
the cube [—1, l] n in M n . Suppose as well that rankG = n, and Re A < for each 
eigenvalue A of the matrix M . 

Then the system (ODE) is controllable. 

Proof. Since rankG = n, Theorem 2.3 tells us that C contains some ball B 
centered at 0. Now take any x° G lR n and consider the evolution 

r x(t) = Mx(t) 

\ x(0) = 

in other words, take the control <*(•) = 0. Since Re A < for each eigenvalue A 
of M, then the origin is asymptotically stable. So there exists a time T such that 
x(T) G B. Thus x(T) G B C C; and hence there exists a control a(-) G .4 steering 
x(T) into in finite time. □ 



EXAMPLE. We once again consider the rocket railroad car, from §1.2, for which 
n = 2, m = 1, A = [—1, 1], and 




Then 

G=[N,MN}=(^ J). 
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Therefore 



rank G = 2 = n. 



Also, the characteristic polynomial of the matrix M is 
p(X) = det(A/ - M) = det ( * _1 



) 



Since the eigenvalues are both 0, we fail to satisfy the hypotheses of Theorem 2.5. 



This example motivates the following extension of the previous theorem: 

THEOREM 2.6 (IMPROVED CRITERION FOR CONTROLLABIL- 
ITY). Assume rankG = n and Re A < for each eigenvalue A of M . 
Then the system (ODE) is controllable. 

Proof. 1. If C ^ W 1 , then the convexity of C implies that there exist a vector 
b ^ and a real number \i such that 

(2.8) b-x°<n 

for all x° G C. Indeed, in the picture we see that b ■ (x° — z°) < 0; and this implies 
(2.8) for fi:=b- z°. 



We will derive a contradiction. 

2. Given 6 ^ 0, /i G 1, our intention is to find x° £ C so that (2.8) fails. Recall 
x° £ C if and only if there exist a time t > and a control <*(■) £ A such that 



□ 





Then 




Define 



v(s) :=6 T X _1 (s)iV 
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3. We assert that 

(2.9) v ^ 0. 

To see this, suppose instead that v = 0. Then k times differentiate the expression 
6 T X _1 (s)iV with respect to s and set s = 0, to discover 

b T M k N = 

for k = 0, 1, 2, . . . . This implies b is orthogonal to the columns of G, and so rank G < 
n. This is a contradiction to our hypothesis, and therefore (2.9) holds. 

4. Next, define a(-) this way: 

V( S ) 



w \0 if v(s) = 0. 



Then 



b ■ x° = — v(s)oi(s) ds = / |v(s)| ds. 
Jo Jo 

We want to find a time t > so that f Q |v(s)| (is > In fact, we assert that 



(2.10) 



/>oo 

/ |v(s) | (is = +oo. 
To begin the proof of (2.10), introduce the function 

v(s) ds. 

We will find an ODE </> satisfies. Take p(-) to be the characteristic polynomial 
of M. Then 

P ("1) V(t) = P ^ e ~ tMN ^ = bT (p (-|) e ~ tM ) N = b T iP(M)e- tM )N = 0, 

since p(M) = 0, according to the Cayley-Hamilton Theorem. But since p (— ^) v(t) = 
0, it follows that 



dt \ dt J \ dt J \ dt J \ dt 

Hence (f> solves the (n + l) th order ODE 



_ P J__J 0W=O . 

We also know </>(•) ^ 0. Let /zi, . . . , // n +i be the solutions of fj,p(—fi) = 0. According 
to ODE theory, we can write 

(/>(£) = sum of terms of the form pi(t)e Mit 
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for appropriate polynomials Pi(-)- 

Furthermore, we see that (i n +i = and Hk = — A&, where Ai, . . . , A n are the 
eigenvalues of M. By assumption Re/ifc > 0, for k = 1, . . . , n. If J °° |v(s)| ds < oo, 
then 

/oo 
|v(s)|cfe^0 ast^oo; 

that is, <p(t) — > as t — > oo. This is a contradiction to the representation formula 
of 4>(t) = Ep i (t)e Atit , with Re^i > 0. Assertion (2.10) is proved. 

5. Consequently given any /j,, there exists t > such that 

b ■ x° = / \v(s)\ ds > \i, 
Jo 

a contradiction to (2.8). Therefore C = R n . □ 



2.4 OBSERVABILITY 

We again consider the linear system of ODE 

, x(t) = M-x.lt) 

(ODE) 



x(0) = x° 
where M G M nXn . 

In this section we address the observability problem, modeled as follows. We 
suppose that we can observe 

(O) y(t) := Nx(t) (t > 0), 

for a given matrix G M mXn . Consequently, y(t) G M m . The interesting situa- 
tion is when m « n and we interpret y(-) as low-dimensional "observations" or 
"measurements" of the high-dimensional dynamics x(-). 

OBSERVABILITY QUESTION: Given the observations y(-), can we in prin- 
ciple reconstruct x(-)? In particular, do observations of y(-) provide enough infor- 
mation for us to deduce the initial value x° for (ODE)? 

DEFINITION. The pair (ODE),(0) is called observable if the knowledge of 
y(-) on any time interval [0, t] allows us to compute x°. 

More precisely, (ODE),(0) is observable if for all solutions xi(-),x 2 (-), iVxi(-) = 
Ax 2 (-) on a time interval [0, t] implies xi(0) = x 2 (0). 

TWO SIMPLE EXAMPLES, (i) If N = 0, then clearly the system is not 
observable. 
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(ii) On the other hand, if m = n and N is invertible, then clearly x(t) = A^ _1 y(t) 
is observable. 

The interesting cases lie between these extremes. □ 

THEOREM 2.7 (OBSERVABILITY AND CONTROLLABILITY). The 

system 

, , fx(t)=Mx(t) 
{ ' \y(t) = Nx(t) 

is observable if and only if the system 

(2.12) z(t) = M T z(t) + N T a(t), A = R m 

is controllable, meaning that C = W 1 . 

INTERPRETATION. This theorem asserts that somehow " observability and 
controllability are dual concepts" for linear systems. 

Proof. 1. Suppose (2.11) is not observable. Then there exist points x 1 ^ x 2 G 
]R n , such that 

xi(£) = Mxi(t), x 1 (0) = x 1 
x 2 (£) = Mx 2 (t), x 2 (0)=x 2 
but y(t) := iVxi(t) = ATx 2 (£) for all times t > 0. Let 



Then 
but 
Now 
Thus 



x(£) := xi(t) - x 2 (£), a; ^^ 1 -^ 2 

i(t) = Mx(t), x(0) = xV0- 
iVx(£) = (£>0). 
x(t) = X(t)x° = e* M x°. 



AT e tM a: o = (t > 0). 

Let t = 0, to find iVx = 0. Then differentiate this expression times in t and 
let t = 0, to discover as well that 

iVM fc £° = 

for k = 0,1,2,.... Hence (x°) T (M k ) T N T = 0, and hence (x°) T (M T ) k N T = 0. 
This implies 

(x°) T [iV T , M T iV T , . . . , (M T ) n_1 iV T ] = 0. 

Since x° ^ 0, rank[iV T , . . . , (M T ) n ~ 1 N T ] < n. Thus problem (2.12) is not control- 
lable. Consequently, (2.12) controllable implies (2.11) is observable. 
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2. Assume now (2.12) not controllable. Then rank[iV T , . . . , (M T ) n ~ 1 N T ] < n, 
and consequently according to Theorem 2.3 there exists x° ^ such that 

(x°) T [N T ,...,(M T ) n - 1 N T } = 0. 

That is, NM k x° = for all k = 0, 1, 2, . . . , n - 1. 
We want to show that y(t) = A^x(t) = 0, where 

f x(£) = Mx(t) 
\ x(0) = x°. 

According to the Cayley-Hamilton Theorem, we can write 

M n = ~P n _ 1 M n ~ 1 PqI. 

for appropriate constants. Consequently NM n x° = 0. Likewise, 

M n+1 = M(-P n _ x M n - x PqI) = -pn-tM" p M; 

and so NM n+1 x° = 0. Similarly, NM k x° = for all k. 
Now 

X (t) = x(t)x° = e Mt x° = ^r^°; 

fc=0 

and therefore Nx(t) = N E^l t ~^ L x° = 0. 

We have shown that if (2.12) is not controllable, then (2.11) is not observable. 

□ 



2.5 BANG-BANG PRINCIPLE. 

For this section, we will again take A to be the cube [— 1, l] m in M m . 

DEFINITION. A control a(-) G A is called bang-bang if for each time t > 
and each index i = 1, . . . , m, we have \a l (t) \ = 1, where 

f a\t) \ 
a(t) = : 

\a m {t)J 



THEOREM 2.8 (BANG-BANG PRINCIPLE). Lett>0 and suppose x° G 
C(t), for the system 

x(t) = Mx(t) + Na(t). 
Then there exists a bang-bang control a(-) which steers x° to at time t. 

To prove the theorem we need some tools from functional analysis, among 
them the Krein-Milman Theorem, expressing the geometric fact that every bounded 
convex set has an extreme point. 
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2.5.1 SOME FUNCTIONAL ANALYSIS. We will study the "geometry" of 
certain infinite dimensional spaces of functions. 

NOTATION: 

L°° = L°°(0, t; M m ) = {<*(•) : (0, t) — > R m | sup |a(s)| < oo}. 

0<s<t 

= sup I ct(s) I . 

0<s<t 

DEFINITION. Let a n G L°° for n = 1, . . . and ex G L°°. We say a n con- 
verges to a in the weak* sense, written 

provided 

/ o: n (s) ■ v(s) (is — > / ct(s) ■ v(s) ds 
Jo Jo 

as n — > oo, for all v(-) : [0, t] — > M m satisfying J* |v(s)| (is < oo. 

We will need the following useful weak* compactness theorem for L°°: 

ALAOGLU'S THEOREM. Let ct n e A, n = 1, . . . . TTien t/iere exzste a 
subsequence ct nk and ex. E A, such that 

OL nk CX. 

DEFINITIONS, (i) The set K is convex if for all x, x G K and all real numbers 
< A < 1, 

Xx + (1 - X)x e K. 

(ii) A point z G K is called extreme provided there do not exist points x, x G K 
and < A < 1 such that 

z = Xx + (1 — X)x. 

KREIN-MILMAN THEOREM. Let K be a convex, nonempty subset of L°° , 
which is compact in the weak * topology. 
Then K has at least one extreme point. 

2.5.2 APPLICATION TO BANG-BANG CONTROLS. 

The foregoing abstract theory will be useful for us in the following setting. We 
will take IK to be the set of controls which steer x° to at time t, prove it satisfies 
the hypotheses of Krein-Milman Theorem and finally show that an extreme point 
is a bang-bang control. 
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x(£) = Mx(t) + Nct{t) 
x(0) = x°. 



So consider again the linear dynamics 
(ODE) 

Take x° G C(t) and write 

IK = {«(•) G .4. |a(-) steers x° to at time t}. 

LEMMA 2.9 (GEOMETRY OF SET OF CONTROLS). The collection K 
o/ admissible controls satisfies the hypotheses of the Krein-Milman Theorem. 

Proof. Since x° G C(t), we see that K^0. 

Next we show that K is convex. For this, recall that a(-) G K if and only if 

x° = - [ X- 1 (s)Not(s)ds. 
Jo 

Now take also ct G K and < A < 1. Then 

-t 



and so 



x° = - f X- 1 (s)Na(s)ds; 
Jo 

x° = -[ X- 1 (s)N(Xa(s) + (l- X)a(s)) ds 
Jo 



Hence Xa. + (1 - A) a G K. 

Lastly, we confirm the compactness. Let ct n G K for n = 1, . . . . According to 
Alaoglu's Theorem there exist nk — > oo and o; G such that a nfe — ^ ct. We need 
to show that aGl. 

Now ct Uk G IK implies 

x° = - f X- 1 (s)Na nk (s)ds^ - [ X- 1 (s)Na(s)ds 
Jo Jo 

by definition of weak-* convergence. Hence a G K. □ 



We can now apply the Krein-Milman Theorem to deduce that there exists 
an extreme point a* G IK. What is interesting is that such an extreme point 
corresponds to a bang-bang control. 

THEOREM 2.10 (EXTREMALITY AND BANG-BANG PRINCIPLE). The 

control ct* (•) is bang-bang. 
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Proof. 1. We must show that for almost all times < s < t and for each 
% = 1, . . . , m, we have 

Suppose not. Then there exists an index i G {1, . . . , m} and a subset E C [0, t] 
of positive measure such that |ck**(s)| < 1 for s E E. In fact, there exist a number 
e > and a subset FEE such that 

|F| > and \a u (s)\ < 1 - e for s E F. 

Define 

J F (/3(-)):= / X- 1 ^)^)^, 

for 

:= (0,..., /?(•),..., 0) T , 
the function /? in the slot. Choose any real-valued function /?(•) ^ 0, such that 

WO) = o 

and < 1. Define 

ai(-) := a*(-) + e/3(0 
a 2 (.) := a*(-)-£/9(-), 
where we redefine /3 to be zero off the set F 

2. We claim that 

ai(-),a 2 (-) E K. 

To see this, observe that 

- / * X- 1 (s)ATai(s) ds = - f* X- 1 (s)ATa*(s) cfe - £ / * X" 1 (s)A^/3(s) cte 

= x°-ef X- 1 (s)N/3(s)ds = x°. 
Jf 

" v ' 

/f(/3(-))=0 

Note also ai(-) G A Indeed, 

f ai(a) = a*(s) (a £ F) 

\ cx 1 ( S ) = <x*( S ) + ef3(s) (sEF). 

But on the set F, we have |a*(s)| < 1 — e, and therefore 

|ai(s)| < \a*{s)\ + e\(3{s)\ < 1 -e + e= 1. 
Similar considerations apply for a 2 . Hence ai, a 2 G K, as claimed above. 

3. Finally, observe that 

f 0.x = at* + e(3, c*i ^ a* 
\ ol 2 = a* - e/3, a 2 ^ a*. 
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But 

1 1 

-on + -OL2 = <* ; 

and this is a contradiction, since a* is an extreme point of K. □ 

2.6 REFERENCES. 

See Chapters 2 and 3 of Macki-Strauss [M-S]. An interesting recent article on 
these matters is Terrell [T] . 
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CHAPTER 3: LINEAR TIME-OPTIMAL CONTROL 



3.1 Existence of time-optimal controls 

3.2 The Maximum Principle for linear time-optimal control 

3.3 Examples 

3.4 References 

3.1 EXISTENCE OF TIME-OPTIMAL CONTROLS. 

Consider the linear system of ODE: 

(ODE) fxW=Mx W + JVaW 

[ x(0) = x , 

for given matrices M G M nXn and N G M nXm . We will again take A to be the 
cube [-l,l] m C R m . 
Define next 

(P) P[a(-)] := - / 1 ds = -r, 

Jo 

where r = r(a(-)) denotes the first time the solution of our ODE (3.1) hits the 
origin 0. (If the trajectory never hits 0, we set r = oo.) 

OPTIMAL TIME PROBLEM: We are given the starting point x° G M n , and 
want to find an optimal control <**(•) such that 

P[a*(-)]= max P[a(-)]. 
ct{-)eA 

Then 

t * = _P[q;*(.)] is the minimum time to steer to the origin. 

THEOREM 3.1 (EXISTENCE OF TIME-OPTIMAL CONTROL). Let 

x° G M, n . Then there exists an optimal bang-bang control <**(•). 

Proof. Let r* := inf{t | x° e C(t)}. We want to show that x° E C(r*); that 
is, there exists an optimal control <**(•) steering x° to at time r*. 

Choose t\ > t 2 > t 3 > . . . so that x° e C(t n ) and t n — > r* . Since x° G C(t n ), 
there exists a control a n (-) G A such that 

"tr, 



/•in 

/ X" 1 (s)A^a n (s)rfs. 



If necessary, redefine ct n (s) to be for t n < s. By Alaoglu's Theorem, there exists 
a subsequence nk — * oo and a control «*(•) so that 



— ^ Oi . 
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We assert that <**(•) is an optimal control. It is easy to check that ct*(s) = 0, 
s > t*. Also 

ftn k ftl 

x° = - ^- 1 (s)Na nk (s) ds = - X- 1 (s)Na nk (s) ds, 
Jo Jo 

since ot nk = for s > t Hk . Let nu — > oo: 

t T * 

x° = - [ 1 X- 1 (s)ATa*(s) ds=- I yr 1 (s)Ncx*{s) ds 
Jo Jo 

because a*(s) = for s > r*. Hence G C(r*), and therefore £**(•) is optimal. 
According to Theorem 2.10 there in fact exists an optimal bang-bang control. 

□ 



3.2 THE MAXIMUM PRINCIPLE FOR LINEAR TIME-OPTIMAL 
CONTROL 

The really interesting practical issue now is understanding how to compute an 
optimal control <**(•)• 

DEFINITION. We define K(t, x°) to be the reachable set for time t. That is, 
K(t, x°) = {x 1 | there exists a(-) G A which steers from x° to x 1 at time t}. 

Since x(-) solves (ODE), we have x 1 G K(t,x°) if and only if 

x 1 =X(t)x° + X(t) [ X- 1 (s)Nct(s)ds = x(t) 
Jo 

for some control a(-) G A. 

THEOREM 3.2 (GEOMETRY OF THE SET K). The setK(t,x°) is con- 
vex and closed. 

Proof. 1. (Convexity) Let x 1 ,^ 2 G K(t,x°). Then there exists ai,a 2 G A 
such that 



x 1 = X(t)x° + X(t) [ X- 1 (s)Na 1 (s)ds 
Jo 

x 2 = X(t)x° + X(t) [ X- 1 (s)Na 2 (s)ds. 
Jo 



Let < A < 1. Then 



Xx 1 + (1 - \)x 2 = X(t)x° + X(t) [ X- 1 (s)N(Xa 1 (s) + (1 - X)a 2 (s))ds, 

JO " v ' 



eA 
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and hence Xx 1 + (1 - \)x 2 G K(t, x°). 

2. (Closedness) Assume x k G K(t, x°) for (k = 1, 2, ... ) and x fc — > y. We must 
show y G -£T(£, x°). As x fc G K (£, x°), there exists G .4. such that 



./o 

According to Alaoglu's Theorem, there exist a subsequence kj — > oo and aei 
such that afc-^a. Let k = kj — > oo in the expression above, to find 



NOTATION. If S 1 is a set, we write dS to denote the boundary of S. 

Recall that r* denotes the minimum time it takes to steer to 0, using the optimal 
control a*. Note that then G dK(r*,x°). 

THEOREM 3.3 (PONTRYAGIN MAXIMUM PRINCIPLE FOR LIN- 
EAR TIME-OPTIMAL CONTROL). There exists a nonzero vector h such that 

(M) h T X- 1 (t)Ncx*(t) =max{h T X- 1 (t)Na} 

for each time < t < t* . 

INTERPRETATION. The significance of this assertion is that if we know 
h then the maximization principle (M ) provides us with a formula for computing 
«*(•), or at least extracting useful information. 

We will see in the next chapter that assertion (M) is a special case of the general 
Pontryagin Maximum Principle. 

Proof. 1. We know G 8K(t*, x°). Since K(t*, x°) is convex, there exists a 
supporting plane to K(t*, x°) at 0; this means that for some g ^ 0, we have 





Thus y G K{t,x°), and hence K{t,x°) is closed. 



□ 



? ^i<0 for all x\ G K(r*,x ). 



2. Now x 1 



G K(t*,x 



) if and only if there exists «(■) G A such that 




Also 
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Since g ■ x 1 < 0, we deduce that 

g T (x(t*)x° + X(t*) X- 1 (s)Nct(s)ds 

<0 = g T (x(t*)x° + X(r*) jT X- 1 (s)Na*(s) ds 
Define h T := # T X(r*). Then 

T * T * 

/ /i T X- 1 (s)A^a(s)rfs < / /i T X- 1 (s)ATa*(s)ds; 
Jo Jo 

and therefore 

h T X- 1 (s)N(a*(s) - a(s))ds>0 



for all controls a(-) G A 

3. We claim now that the foregoing implies 

ft T X _1 (s)iVa*(s) = max{^ T X _1 (s)iVa} 

for almost every time s. 

For suppose not; then there would exist a subset E C [0, r*] of positive measure, 
such that 

/i T X- 1 (s)ATa*(s) < max{^ T X- 1 (s)JVa} 
for s E E. Design a new control &(•) as follows: 

a*(s) (s £ E) 



a{s) = 



a{s) (s G E) 



where ct(s) is selected so that 



Then 



msx{h T X- 1 (s)Na} = h T X- 1 (s)Na(s). 

aEA 



h T X- 1 (s)N(a*(s) - a(s))ds > 0. 

E s v ' 



<0 

This contradicts Step 2 above. □ 



For later reference, we pause here to rewrite the foregoing into different notation; 
this will turn out to be a special case of the general theory developed later in Chapter 
4. First of all, define the Hamiltonian 

H(x,p,a) := (Mx + Na) -p (x,p G W 1 , a G A). 
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THEOREM 3.4 (ANOTHER WAY TO WRITE PONTRYAGIN MAXI- 
MUM PRINCIPLE FOR TIME-OPTIMAL CONTROL). Let £**(•) be a time 
optimal control and x*(-) the corresponding response. 

Then there exists a function p*(-) : [0, r*] — > M n , such that 



(ODE) x*(f) = V p ff(x*(f),p*(f),a*(f)), 

(ADJ) p* (*) = - V x tf (x* (*) , p* (*) , a* (*) ) , 

and 

(M) H(x*(t),p*(t),a*(t)) = max#(x*(t),p*(t),a). 



We call (ADJ) the adjoint equations and (M) the maximization principle. The 
function p*(-) is the costate. 

Proof. 1. Select the vector /i as in Theorem 3.3, and consider the system 

r p*(t) = -M T P *(*) 

I P*(0) = fc. 

The solution is p*(t) = e~ tMT h; and hence 

p*(t) T = fc T X- 1 (t), 
since (e" tMT ) T = e~ tM = X" 1 ^). 

2. We know from condition (M) in Theorem 3.3 that 

h T yi- 1 (t)Ncx*(t) = max^X" 1 (t)iVa} 

Since p*(t) T = /i T X _1 (t), this means that 

p*(t) T (Mx*(t) + Na*(t)) = max{p*(t) T (Mx*(t) + Na)}. 

aeA 

3. Finally, we observe that according to the definition of the Hamiltonian H, 
the dynamical equations for x*(-),p*(-) take the form (ODE) and (ADJ), as stated 
in the Theorem. □ 

3.3 EXAMPLES 

EXAMPLE 1: ROCKET RAILROAD CAR. We recall this example, intro- 
duced in §1.2. We have 

(ODE) *(*)=(o J) x (*)+ (!) a W 

M N 

35 



for 



According to the Pontryagin Maximum Principle, there exists h ^ such that 
(M) ^X -1 ^)^*^) = max{^ T X- 1 (t)JVa}. 

|o|<l 

We will extract the interesting fact that an optimal control a* switches at most one 
time. 

We must compute e tM . To do so, we observe 



and therefore M fc = for all > 2. Consequently, 

e tM = I + tM = 



1 t 
1 



Then 



1 -* 
1 



1 -t\/0 
11 



x-^f) = 

X~ 1 (t)N = 

h T X- 1 (t)N=(h 1 ,h 2 ) = -tfn + /i 2 . 

The Maximum Principle asserts 

{—th\ + h2)o*{t) = max{(— th\ + ^2)0}; 

|a|<l 

and this implies that 

a*(t) = sgn(-thi + h 2 ) 

for the sign function 

{1 x > 
x = 
-1 x < 0. 

Therefore the optimal control a* switches at most once; and if h\ = 0, then a* is 
constant. 

Since the optimal control switches at most once, then the control we constructed 
by a geometric method in §1.3 must have been optimal. □ 

EXAMPLE 2: CONTROL OF A VIBRATING SPRING. Consider next 
the simple dynamics 

x + x = a, 
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spring 



mass 

where we interpret the control as an exterior force acting on an oscillating weight 
(of unit mass) hanging from a spring. Our goal is to design an optimal exterior 
forcing «*(•) that brings the motion to a stop in minimum time. 

We have n = 2, m = 1. The individual dynamical equations read: 

x x {t) = x 2 (t) 
x 2 (t) = -x 1 (t) + a(t); 
which in vector notation become 

(ODE) *(*)=(-! j) x W + (!) a W 

M N 

for \a(t) \ < 1. That is, A = [-1, 1]. 

Using the maximum principle. We employ the Pontryagin Maximum Prin- 
ciple, which asserts that there exists h ^ such that 

(M) fj T X _1 (£)iVc**(£) =max{/i T X- 1 (t)A^a}. 

a£ A 

To extract useful information from (M) we must compute X(-). To do so, we 
observe that the matrix M is skew symmetric, and thus 



Therefore 



M k =I if fc = 0,4,8,.. 

M k =M if k = 1,5,9,.. 

M k =-I if jfe = 2,6, ... 

M k =—M if A; = 3,7,...: 



and consequently 



+2 

e tM = J + £M + ^yM 2 + . . . 

£ 2 £ 3 £ 4 
= I + £M--/--M+-/+... 

v 2! 4! ' v 3! 5! ; 



cost/ + sin£M 



cos £ sin £ 
— sin £ cos £ 
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So we have 

cos t — sin t 
sin t cos t 



x-\t) = 

and 

x _1 (t)iv = 

whence 



cos t — sin t \ ( \ _ / — sin t 
sint cost / \ 1 / \ cost 



h T X-\t)N = (h u h 2 ) ( Sm / J = -hi sint + h 2 cost. 

\ COS 6 / 

According to condition (M), for each time t we have 

(—hi sint + hi cost)a*(t) = max{(— h\ sint + /i 2 cost)a}. 

|a|<l 

Therefore 

a*(t) = sgn(— hi sint + h 2 cost). 

Finding the optimal control. To simplify further, we may assume h\ + h\ — 
1. Recall the trig identity sin(a; + y) = sin x cosy + cos x sin y, and choose 5 such 
that — hi = cos 5, /12 = sin 5. Then 

a*(t) = sgn(cos5sint + sin 5 cost) = sgn(sin(t + 8)). 

We deduce therefore that a* switches from +1 to —1, and vice versa, every tt units 
of time. 

Geometric interpretation. Next, we figure out the geometric consequences. 
When a = 1, our (ODE) becomes 



x 1 = x 2 



X 



2 



-X 1 + 1. 



In this case, we can calculate that 

j t ((x\t) - l) 2 + (x 2 ) 2 (t)) = 2(x 1 (t) - l)x\t) + 2x 2 (t)x 2 (t) 

= 2(x 1 (t) - l)x 2 (t) + 2x 2 (t)(-x l (t) + 1) = 0. 

Consequently, the motion satisfies (x 1 (t) — l) 2 + (x 2 ) 2 (t) = r 2 , for some radius ri, 
and therefore the trajectory lies on a circle with center (1,0), as illustrated. 
If a = — 1, then (ODE) instead becomes 

x x =x 2 
x 2 = -x 1 - 1; 

in which case 

j t ((x\t) + l) 2 + (x 2 ) 2 (t)) = 0. 
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Thus + l) 2 + (x 2 ) 2 (t) = r| for some radius r2, and the motion lies on a circle 

with center (—1,0). 
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In summary, to get to the origin we must switch our control a(-) back and forth 
between the values ±1, causing the trajectory to switch between lying on circles 
centered at (±1, 0). The switches occur each n units of time. 
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CHAPTER 4: THE PONTRYAGIN MAXIMUM PRINCIPLE 



4.1 Calculus of variations, Hamiltonian dynamics 

4.2 Review of Lagrange multipliers 

4.3 Statement of Pontryagin Maximum Principle 

4.4 Applications and examples 

4.5 Maximum Principle with transversality conditions 

4.6 More applications 

4.7 Maximum Principle with state constraints 

4.8 More applications 

4.9 References 



This important chapter moves us beyond the linear dynamics assumed in Chap- 
ters 2 and 3, to consider much wider classes of optimal control problems, to intro- 
duce the fundamental Pontryagin Maximum Principle, and to illustrate its uses in 
a variety of examples. 

4.1 CALCULUS OF VARIATIONS, HAMILTONIAN DYNAMICS 

We begin in this section with a quick introduction to some variational methods. 
These ideas will later serve as motivation for the Pontryagin Maximum Principle. 

Assume we are given a smooth function L : R n x M n — > R, L = L(x,v); L is 
called the Lagrangian. Let T > 0, x°, x 1 G M n be given. 

BASIC PROBLEM OF THE CALCULUS OF VARIATIONS. Find a 
curve x*(-) : [0, T] — > M. n that minimizes the functional 

rp 

(4.1) /[*(•)] := / L(x(f),x(f))d* 

Jo 

among all functions x(-) satisfying x(0) = x° and x(T) = x 1 . 

Now assume x*(-) solves our variational problem. The fundamental question is 
this: how can we characterize x*(-)? 

4.1.1 DERIVATION OF EULER-LAGRANGE EQUATIONS. 

NOTATION. We write L = L(x, v), and regard the variable x as denoting position, 
the variable v as denoting velocity. The partial derivatives of L are 

H =L *" wr Lvi (1 ^^ n) ' 

and we write 

V X L := (L Xl , . . . , L Xn ), V„L := (L Vl , . . . , L Vn ). 
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THEOREM 4.1 (EULER-LAGRANGE EQUATIONS). Let x*(-) solve 
the calculus of variations problem. X7ienx*(-) solves the Euler-Lagrange differential 
equations: 

(E-L) ^[V„L(x*(t),x*(t))] = V x L(x*(f), **(*)). 

The significance of preceding theorem is that if we can solve the Euler-Lagrange 
equations (E-L), then the solution of our original calculus of variations problem 
(assuming it exists) will be among the solutions. 

Note that (E-L) is a quasilinear system of n second-order ODE. The i th com- 
ponent of the system reads 

f t [L Vi (^(t),^(t))} = L Xi (^(t),^(t)). 

Proof. 1. Select any smooth curve y[0, T] -> M n , satisfying y(0) = y(T) = 0. 
Define 

i(r) :=J[x(-) + ry(-)] 
for r G K. and x(-) = x*(-). (To simplify we omit the superscript *.) Notice 
that x(-) + ry(-) takes on the proper values at the endpoints. Hence, since x(-) is 
minimizer, we have 

i(r)>J[x(.)] = <(0). 
Consequently i(-) has a minimum at r = 0, and so 

i'(0) = 0. 

2. We must compute i'(r). Note first that 

rp 

z(r)= / L(x(t) + Ty(t),x(£) + Ty(£))eft; 
Jo 

and hence 

\ i=1 i=1 / 

Let r = 0. Then 

n „r 

= z'(0) = V / L Ki (x(t),x(t))^(t) + L Vi (x(t),x(t))^(t)^. 
i=i Jo 

This equality holds for all choices of y : [0,T] -> R n , with y(0) = y(T) = 0. 

3. Fix any 1 < j < n. Choose y(-) so that 

yi (t) = i^j, y j (t) = 1>(t), 
42 



where ip is an arbitary function. Use this choice of y(-) above: 

0=[ L Xj {x{t),±{t))^(t) + L Vj (x{t),x(t))iP{t)dt. 
Jo 



Integrate by parts, recalling that V'(O) = ip(T) = 0: 

ip(t) dt. 



L K .(x(t),x(t))-^(L„.(x(t),x(t))) 

This holds for all V : [0, T] -> R, ^(0) = ip(T) = and therefore 

L Xj (x(t),x(t)) - ^ (L„.(x(t),x(t))) = 

for all times < t < T. To see this, observe that otherwise L Xj — -^(L Vj ) would be, 
say, positive on some subinterval on / C [0, T]. Choose ip = off /, ip > on /. 
Then 



I (^-|(^,))**>o, 



a contradiction. □ 

4.1.2 CONVERSION TO HAMILTON'S EQUATIONS. 

DEFINITION. For the given curve x(-), define 

p(£) := V„L(x(t),x(t)) (0<£<T). 

We call p(-) the generalized momentum. 

Our intention now is to rewrite the Euler-Lagrange equations as a system of 
first-order ODE for x(-),p(-). 

IMPORTANT HYPOTHESIS: Assume that for all x,p G M n , we can solve the 
equation 

(4.2) p = V v L(x,v) 

for v in terms of x and p. That is, we suppose we can solve the identity (4.2) for 
v = v(x, p). 

DEFINITION. Define the dynamical systems Hamiltonian H : R n x R n — > R 
by the formula 

H(x,p) = p ■ v(x,p) — L(x, v(x,p)), 
where v is defined above. 

NOTATION. The partial derivatives of H are 

r) J-f f)¥f 

_ = *„,_ = *„ (!<»<„), 
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and we write 

V X H := (H Xl , H Xn ), V P H := (H pi , H Pn ). 

THEOREM 4.2 (HAMILTONIAN DYNAMICS). Letx(-) solve the Euler- 
Lagrange equations (E-L) and define p(-)as above. Then the pair (x(-),p(-)) solves 
Hamilton's equations; 

*(t) = V p tf(x(*),p(*)) 
p(£) = -V K #(x(t),p(t)) 

Furthermore, the mapping t h- > H(x(t),p(t)) is constant. 



(H) 



Proof. Recall that H(x,p) = p ■ v(x,p) — L(x, v(x,p)), where v = v(x,p) or, 
equivalently, p = V v L(x,v). Then 

V x H(x,p) = p ■ V x v - V x L(x, v(x,p)) - V v L(x, v(x,p)) • V x v 
= -V x L(x,v(x,p)) 

because p = V V L. Now p(t) = V„L(x(t), x(t)) if and only if x(£) = v(x(t), p(t)). 
Therefore (E-L) implies 

p(t) = V x L(x(t),x(£)) 

= V x L(x(t), v(x(t), p(t))) = -V x H(x(t),p(t))- 

Also 

W p H(x,p) = v(x,p) + p- V p v - V V L ■ V p v = v(x,p) 
since p = V v L(x,v(x,p)). This implies 

V p tf(x(*),p(*)) = v(x(*),p(*)). 

But 

p(t) = V„L(x(£),x(£)) 
and so x(t) = v(x(i), p(t)). Therefore 

x(*) = V p if(x(*),p(*)). 

Finally note that 

j t H(x(t),p(t)) = V X H ■ ±(t) + V p H ■ p(t) = V X H ■ V p H + V p H ■ (-V X H) = 0. 



□ 



A PHYSICAL EXAMPLE. We define the Lagrangian 

L(x,v) = ^-V(x), 
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which we interpret as the kinetic energy minus the potential energy V. Then 



V x L = -S7V(x), V v L = mv. 
Therefore the Euler-Lagrange equation is 

mk(t) = -W(x(£)), 
which is Newton's law. Furthermore 

p = V v L(x, v) = mv 
is the momentum, and the Hamiltonian is 

H(x,p) =p.^- L (x^)= 1 -^-^ ^ 2 + V(x) = ^ + V(x), 
m V m/ m 2 m 2m 

the sum of the kinetic and potential energies. For this example, Hamilton's equa- 
tions read 

x(t) = 

p(t) = -W(x(t)). 

□ 

4.2 REVIEW OF LAGRANGE MULTIPLIERS. 

CONSTRAINTS AND LAGRANGE MULTIPLIERS. What first strikes 
us about general optimal control problems is the occurence of many constraints, 
most notably that the dynamics be governed by the differential equation 

±(t) = f(pc(t),a(t)) (t>0) 



This is in contrast to standard calculus of variations problems, as discussed in §4.1, 
where we could take any curve x(-) as a candidate for a minimizer. 

Now it is a general principle of variational and optimization theory that "con- 
straints create Lagrange multipliers" and furthermore that these Lagrange multi- 
pliers often "contain valuable information" . This section provides a quick review of 
the standard method of Lagrange multipliers in solving multivariable constrained 
optimization problems. 

UNCONSTRAINED OPTIMIZATION. Suppose first that we wish to 
find a maximum point for a given smooth function / : M. n — > R. In this case there 
is no constraint, and therefore if f(x*) = max^n^ f(x), then x* is a critical point 
of/: 

V/(:r*) = 0. 
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CONSTRAINED OPTIMIZATION. We modify the problem above by 
introducing the region 

R := {x G R n | g(x) < 0}, 

determined by some given function g : R n — > R. Suppose x* G R and f(x*) = 
m&x xe n f(x). We would like a characterization of x* in terms of the gradients of / 
and g. 

Case 1: x* lies in the interior of R. Then the constraint is inactive, and so 
(4.3) V/(x*) = 0. 



gradient of f 




FIGURE 1 



Case 2: x* lies on dH. We look at the direction of the vector V/(x*). A 
geometric picture like Figure 1 is impossible; for if it were so, then f(y*) would 
be greater that f(x*) for some other point y* G dR. So it must be V/(x*) is 
perpendicular to dR at shown in Figure 2. 




FIGURE 2 
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Since Vg is perpendicular to OR = {g = 0}, it follows that Vf(x*) is parallel 
to Vg{x*). Therefore 



(4.4) 



V/(x*) = AV</(x*) 



for some real number A, called a Lagrange multiplier. 

CRITIQUE. The foregoing argument is in fact incomplete, since we implicitly 
assumed that Wg(x*) 7^ 0, in which case the Implicit Function Theorem implies 
that the set {g = 0} is an (n — 1) -dimensional surface near x* (as illustrated). 

If instead Vg(x*) = 0, the set {g = 0} need not have this simple form near x*; 
and the reasoning discussed as Case 2 above is not complete. 

The correct statement is this: 



If n 7^ 0, we can divide by \x and convert to the formulation (4.4). And HVg(x*) = 0, 
we can take A = 1, ji = 0, making assertion (4.5) correct (if not particularly useful). 



4.3 STATEMENT OF PONTRYAGIN MAXIMUM PRINCIPLE 

We come now to the key assertion of this chapter, the theoretically interesting 
and practically useful theorem that if a*(-) is an optimal control, then there exists a 
function p*(-), called the costate, that satisfies a certain maximization principle. We 
should think of the function p*(-) as a sort of Lagrange multiplier, which appears 
owing to the constraint that the optimal curve x*(-) must satisfy (ODE). And just 
as conventional Lagrange multipliers are useful for actual calculations, so also will 
be the costate. 

We quote Francis Clarke [C2]: "The maximum principle was, in fact, the culmi- 
nation of a long search in the calculus of variations for a comprehensive multiplier 
rule, which is the correct way to view it: p(t) is a "Lagrange multiplier" ... It 
makes optimal control a design tool, whereas the calculus of variations was a way 
to study nature." 

4.3.1 FIXED TIME, FREE ENDPOINT PROBLEM. Let us review the 
basic set-up for our control problem. 

We are given A C R m and also f : R n x A -> R n , x° G l n . We as before denote 
the set of admissible controls by 



(4.5) 



There exist real numbers A and fi, not both equal to 0, such that 
//V/(x*) = Wg(x*). 



□ 



A = {<*(•) : [0, 00) — > A I ct(-) is measurable}. 
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Then given a(-) G ^4, we solve for the corresponding evolution of our system: 

(ODE) { A(f) = f(x(f),a(f)) (f>0) 

1 ' \ x(0) = x°. 

We also introduce the payoff functional 

(P) P[a(-)]=/ r(x(f),a(f))<ft + (/(x(T)), 

./o 

where the terminal time T > 0, running payoff r : R n xA^l and terminal payoff 
g : lR n — > M are given. 

BASIC PROBLEM: Find a control «*(•) such that 

P[a*(-)]= max P[a(-)]. 

The Pontryagin Maximum Principle, stated below, asserts the existence of a 
function p*(-), which together with the optimal trajectory x*(-) satisfies an analog 
of Hamilton's ODE from §4.1. For this, we will need an appropriate Hamiltonian: 

DEFINITION. The control theory Hamiltonian is the function 

H(x,p, a) := f(x, a) ■ p + r(x, a) (x,p G M n , a G A). 

THEOREM 4.3 (PONTRYAGIN MAXIMUM PRINCIPLE). Assume «*(•) 
is optimal for (ODE), (P) and x*(-) is the corresponding trajectory. 
Then there exists a function p* : [0, T] — > IR n such that 

(ODE) **(*) = V p H(x*(t),p*(t),cx*(t)), 

(AD J) p* (t) = - V x (x* (t) , p* (t) , a* (t) ) , 

and 

(M) H(x.*(t),p*(t),a*(t)) = maxH(x*(t),p*(t),a) (0 < t < T). 
In addition, 

the mapping t i— > H(x*(t),p*(t),ct*(t)) is constant. 
Finally, we have the terminal condition 
(T) p*(T) = Vs(x*(T)). 

REMARKS AND INTERPRETATIONS, (i) The identities (AD J) are 
the adjoint equations and (M) the maximization principle. Notice that (ODE) and 
(AD J) resemble the structure of Hamilton's equations, discussed in §4.1. 
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(ODE) 



We also call (T) the transversality condition and will discuss its significance 
later. 

(ii) More precisely, formula (ODE) says that for 1 < i < n, we have 
x l * it) = H Pi (x* (t) , p* (t) , a* (t) ) = f (x* (f ) , a* (t) ) , 
which is just the original equation of motion. Likewise, (ADJ) says 

n 

□ 

4.3.2 FREE TIME, FIXED ENDPOINT PROBLEM. Let us next record 
the appropriate form of the Maximum Principle for a fixed endpoint problem. 

As before, given a control a(-) E A, we solve for the corresponding evolution 
of our system: 

x(t) = f(x(t),a(t)) (t>0) 
x(0) = x°. 

Assume now that a target point x 1 G M n is given. We introduce then the payoff 
functional 

(P) P[a(-)]= [ T r(x(t), a (t))dt 

Jo 

Here r : IR n x A — * R is the given running payoff, and r = r [<*(•)] < oo denotes the 
first time the solution of (ODE) hits the target point x 1 . 

As before, the basic problem is to find an optimal control «*(•) such that 

P[a*(-)]= max P[a(-)]. 

Define the Hamiltonian H as in §4.3.1. 

THEOREM 4.4 (PONTRYAGIN MAXIMUM PRINCIPLE). Assume «*(•) 
is optimal for (ODE), (P) and x*(-) is the corresponding trajectory. 
Then there exists a function p* : [0, r*] — > lR n swc/i t/iat 

(ODE) x*(t) = V p tf(x*(*),p*(*),a*(*)), 

(ADJ) p* (t) = - V x ^(x* (t) , p* (t) , a* (t) ) , 

and 

(M) #(x*(£),p*(t),a*(t)) = max#(x*(£),p*(£),a) (0 < t < r*). 

aeA 
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Also, 



H(x*(t),p*(t),a*(t)) = (0<t<r*). 



Here r* denotes the first time the trajectory x*(-) hits the target point x 1 . We 
call x*(-) the state of the optimally controlled system and p*(-) the costate. 

REMARK AND WARNING. More precisely, we should define 

H(x,p, q, a) = f (x, a) • p + r(x, a)q (q EM.). 

A more careful statement of the Maximum Principle says "there exists a constant 
q > and a function p* : [0,t*] -> M n such that (ODE), (ADJ), and (M) hold". 

If q > 0, we can renormalize to get q = 1, as we have done above. If q = 0, then 
H does not depend on running payoff r and in this case the Pontryagin Maximum 
Principle is not useful. This is a so-called "abnormal problem" . 

Compare these comments with the critique of the usual Lagrange multiplier 
method at the end of §4.2, and see also the proof in §A.5 of the Appendix. □ 

4.4 APPLICATIONS AND EXAMPLES 

HOW TO USE THE MAXIMUM PRINCIPLE. We mentioned earlier 
that the costate p*(-) can be interpreted as a sort of Lagrange multiplier. 

Calculations with Lagrange multipliers. Recall our discussion in §4.2 
about finding a point x* that maximizes a function /, subject to the requirement 
that g < 0. Now x* = {x\, . . . , x* n ) T has n unknown components we must find. 
Somewhat unexpectedly, it turns out in practice to be easier to solve (4.4) for the 
n + 1 unknowns x\, . . . , x* n and A. We repeat this key insight: it is actually easier 
to solve the problem if we add a new unknown, namely the Lagrange multiplier. 
Worked examples abound in multivariable calculus books. 

Calculations with the costate. This same principle is valid for our much 
more complicated control theory problems: it is usually best not just to look for an 
optimal control <**(•) and an optimal trajectory x*(-) alone, but also to look as well 
for the costate p*(-)- I n practice, we add the equations (ADJ) and (M) to (ODE) 
and then try to solve for a*(-),x*(-) and for p*(-). 

The following examples show how this works in practice, in certain cases for 
which we can actually solve everything explicitly or, failing that, at least deduce 
some useful information. 
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4.4.1 EXAMPLE 1: LINEAR TIME-OPTIMAL CONTROL. For this ex- 
ample, let A denote the cube [— 1, l] n in R n . We consider again the linear dynamics: 

f -kit) = Mx(t) + Na(t) 
(0DE) |x(oU», 

for the payoff functional 

(P) P[a(-)] = - [ ldt = -r, 

Jo 

where r denotes the first time the trajectory hits the target point x 1 = 0. We have 
r = — 1, and so 

H(x, p,a) = f • p + r = (Ms + Na) ■ p — 1. 

In Chapter 3 we introduced the Hamiltonian H = (Mx + Na) ■ p, which differs 
by a constant from the present H. We can redefine H in Chapter III to match the 
present theory: compare then Theorems 3.4 and 4.4. □ 

4.4.2 EXAMPLE 2: CONTROL OF PRODUCTION AND CONSUMP- 
TION. We return to Example 1 in Chapter 1, a model for optimal consumption in 
a simple economy. Recall that 

x(t) = output of economy at time t, 

a(t) = fraction of output reinvested at time t. 

We have the constraint < a(t) < 1; that is, A = [0, 1] C R. The economy evolves 
according to the dynamics 

x(t) = a(t)x(t) (0 < t < T) 
x(0) = x° 

where x° > and we have set the growth factor k = 1. We want to maximize the 
total consumption 

(P) P[a(-)}:= I (1 - a(t))x(t) dt 

Jo 

How can we characterize an optimal control «*(•)? 

Introducing the maximum principle. We apply Pontryagin Maximum 
Principle, and to simplify notation we will not write the superscripts * for the 
optimal control, trajectory, etc. We have n = m = 1, 

/(x, a) = ia, (7 = 0, r(x, a) = (1 — a)x; 

and therefore 



(ODE) 



H(x, p, a) = f(x, a)p + r(x, a) = pxa + (1 — a)x = x + ax(p — 1). 
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ct(t) = 



The dynamical equation is 
(ODE) x(t) = H p = a{t)x(t), 

and the adjoint equation is 

(ADJ) p(t) = -H x = -1 - a(t)(p(t) - 1). 

The terminal condition reads 

(T) p(T) = g x (x(T)) = 0. 

Lastly, the maximality principle asserts 
(M) H(x(t),p(t),a(t)) = max {x(t) + ax(t)(p(t) - 1)}. 

0<a<l 

Using the maximum principle. We now deduce useful information from 
(ODE), (ADJ), (M) and (T). 

According to (M), at each time t the control value a(t) must be selected to 
maximize a(p(t) — 1) for < a < 1. This is so, since x(t) > 0. Thus 

1 if p(t) > 1 
if p(t) < 1. 

Hence if we know p(-), we can design the optimal control «(•)• 

So next we must solve for the costate p(-)- We know from (ADJ) and (T) that 

( p(t) = -l-a(t)\p(t)-l] (0<t<T) 
I P(T) = 0. 

Since p(T) = 0, we deduce by continuity that pit) < 1 for t close to T, t < T. Thus 
a(t) = for such values of t. Therefore p(t) = —1, and consequently p(t) = T — t 
for times t in this interval. So we have that p(t) = T — t so long as p(t) < 1. And 
this holds for T - 1 < t < T 

But for times t < T — 1, with t near T — 1, we have a(t) = 1; and so (ADJ) 
becomes 

p(t) = -l-(p(t)-l) = -p(t). 

Since p(T — 1) = 1, we see that p(t) = e T ~ 1 ~ t > 1 for all times < t < T — 1. In 
particular there are no switches in the control over this time interval. 

Restoring the superscript * to our notation, we consequently deduce that an 
optimal control is 

f 1 if < t < t* 
a*(t) = I ~ ~ 

w 1 if t* < t < T 

for the optimal switching time t* = T — 1. 
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We leave it as an exercise to compute the switching time if the growth constant 



(ODE) 



4.4.3 EXAMPLE 3: A SIMPLE LINEAR-QUADRATIC REGULA- 
TOR. We take n = m = 1 for this example, and consider the simple linear dynamics 

x(t) = x(t) + ait) 
x(0) = x°, 

with the quadratic cost functional 

x(t) 2 + a(t) 2 dt, 



I 

Jo 



'o 

which we want to minimize. So we want to maximize the payoff functional 

rp 

(P) p[ a (-)] = - I x{tf + a(t) 2 dt. 

Jo 

For this problem, the values of the controls are not constrained; that is, A = R. 

Introducing the maximum principle. To simplify notation further we again 
drop the superscripts *. We have n = m = 1, 

f(x, a) = x + a, (7 = 0, r(x, a) = — x 2 — a 2 ; 

and hence 

H(x 1 p, a) = fp + r = (x + a)p — (x 2 + a 2 ) 
The maximality condition becomes 

(M) H(x(t),p(t),a(t)) = max{-(x(t) 2 + a 2 ) + p(t)(x(t) + a)} 

We calculate the maximum on the right hand side by setting H a = —2a + p = 0. 
Thus a = |, and so 

a(t) = f . 
The dynamical equations are therefore 

(ODE) x(t) = x(t) + ^ 

and 

(AD J) pit) = -H x = 2x(t) - p(t) . 

Moreover x(0) = x°, and the terminal condition is 

(T) p(T) = 0. 
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Using the Maximum Principle. So we must look at the system of equations 

x\_fl 1/2 \ fx 

p) ~ v^_l\p 

M 

the general solution of which is 

( X (P)\ _ p tM f X° 
\p(t)J ' \p° 

Since we know x°, the task is to choose p° so that p(T) = 0. 

Feedback controls. An elegant way to do so is to try to find optimal control 
in linear feedback form; that is, to look for a function c(-) : [0, T] — > R for which 

a(t) = c(t) x(t). 

We henceforth suppose that an optimal feedback control of this form exists, 
and attempt to calculate c(-). Now 

P(t) 



2 

_ vit) 



= a(t) = c(t)x(t); 



whence c(t) = ^^y- Define now 

p(t) 



d(t) 



x(ty 



so that c(t) = 



We will next discover a differential equation that d(-) satisfies. Compute 

p px 



d 

and recall that 

p = 2x — p 



9 

X X A 



X — X I 2 



Therefore 



O I d 2 

2d . 

2 



Since p(T) = 0, the terminal condition is d(T) = 0. 

So we have obtained a nonlinear first-order ODE for d(-) with a terminal bound- 
ary condition: 

r A = <?, - 9A - 1 ^ 2 

(R) 



J d = 2 - 2d - ±d 2 (0 < t < T) 
I d(T) = 0. 



This is called the Riccati equation. 
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In summary so far, to solve our linear-quadratic regulator problem, we need to 
first solve the Riccati equation (R) and then set 

ait) = \d(t)x(t). 



How to solve the Riccati equation. It turns out that we can convert (R) it 
into a second-order, linear ODE. To accomplish this, write 

for a function &(•) to be found. What equation does &(•) solve? We compute 

2b 2(b) 2 2b d 2 



d 



b b 2 b 2 



Hence (R) gives 
and consequently 



2& ; d 2 n nl n 2b 

— = = 2-2d = 2- 2 — ; 

b 2 b 



b = b-2b (0 < t < T) 
b(T) = 0, b(T) = 1. 

This is a terminal-value problem for second-order linear ODE, which we can solve 
by standard techniques. We then set d = ^, to derive the solution of the Riccati 
equation (R). 

We will generalize this example later to systems, in §5.2. □ 

4.4.4 EXAMPLE 4: MOON LANDER. This is a much more elaborate 
and interesting example, already introduced in Chapter 1. We follow the discussion 
of Fleming and Rishel [F-R] . 

Introduce the notation 

h(t) = height at time t 
v(t) = velocity = h(t) 

m{t) = mass of spacecraft (changing as fuel is used up) 
a(t) = thrust at time t. 

The thrust is constrained so that < a(t) < 1; that is, A = [0, 1]. There are also 
the constraints that the height and mass be nonnegative: h(t) > 0, m(t) > 0. 
The dynamics are 

h(t) = v(t) 

(ODE) { m = -g+^ 

rh(t) = —ka(t), 

55 



with initial conditions 

' h(0) = h >0 
v{0) = v 
m(0) = mo > 0. 

The goal is to land on the moon safely, maximizing the remaining fuel m(r), 
where r = r[a(-)] is the first time h(r) = v(t) = 0. Since a = — ^, our intention is 
equivalently to minimize the total applied thrust before landing; so that 

(P) P[<*(-)] = ~ [ a(t)dt. 

Jo 



This is so since 

' r a(t)^= mo - m(T) . 
'o k 



f 

Jo 





Introducing the maximum principle. In terms of the general notation, we 
have 

x(f) = 

Hence the Hamiltonian is 

H(x, p,a) = f • p + r 

= (v, -g + a/m, -ka) ■ {pi,P2,Ps) - a 
= -a + piv + p 2 \-g + — ) + p s (-ka). 

V TO/ 

We next have to figure out the adjoint dynamics (AD J). For our particular 
Hamiltonian, 

H Xl = Hh = 0, H x = H v = pi, H x = H m = — 

TO 

Therefore 

(p\t) = 

(AD J) J P 2 (t) = -P 1 (t) 

£3( t ) _ P 2 (t) a (t) 
K P \ l ) — m(t)2 • 

The maximization condition (M) reads 
(M) 

H(x(t),p(t), a(t)) = max H(x(t), p(t), a) 

0<a<l 



max \ -a + p 1 (t)v(t) +p 2 (t) 

0<a<l [ 



a 

-9 + 



+ p 3 (t)(-ka) 



m(t) 

p\t)v(t) - p 2 (t)g + max [a (-1 + ?®- kp 3 (t) 
0<a<l ^ \ m(t) 
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Thus the optimal control law is given by the rule: 



a(t) 



if 
if 



Using the maximum principle. Now we will attempt to figure out the form 
of the solution, and check it accords with the Maximum Principle. 

Let us start by guessing that we first leave rocket engine of (i.e., set a = 0) and 
turn the engine on only at the end. Denote by r the first time that h(r) = v(t) = 0, 
meaning that we have landed. We guess that there exists a switching time t* < r 
when we turned engines on at full power (i.e., set a = 1). Consequently, 

for < t < t* 

1 for t* <t< t. 



a(t) = 



Therefore, for times t* < t < r our ODE becomes 

hit) = v(t) 

v(t) = -9+^ (t*<t<T) 
m(t) = —k 

with h(r) = 0, v(t) = 0, m(t*) = m . We solve these dynamics: 

m(t) = m + k(t* - t) 



v(t) =g{r-t) + \ log 
hit) = complicated formula. 



mo+k(t* —t) 
m +k(t*-t) 



Now put t = t* 



m(t*) = mo 

v(t*) = g(r - t*) + llog 
h(t*) = - 9 -^^-^\og 



mo+k(t* —t) 

mp+k(t* — t) 
m 



+ 



t —T 



Suppose the total amount of fuel to start with was m\\ so that mo — mi is the 
weight of the empty spacecraft. When a = 1, the fuel is used up at rate k. Hence 



k{r — t*) < mi, 



and so < r -t* < ^f. 



Before time t*, we set a = 0. Then (ODE) reads 

h = v 

v = -g 
m = 0; 
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h axis 



x-t*=m 1 /k 



powered descent trajectory (a = 1 ) 



v axis 



and thus 



m(t) = mo 
v(t) = -gt + v 



h(t) = -±gt 2 + tv + ho. 
We combine the formulas for v(t) and h(t), to discover 

hit) =h -^(v 2 (t)-v 2 ) (0<t<t*). 

We deduce that the freefall trajectory (v(t), h(t)) therefore lies on a parabola 

h = h - ^(v 2 - v 2 ). 




h axis 



^ freefall trajectory (a = o) 



powered trajectory (a = 1 ) 



v axis 



If we then move along this parabola until we hit the soft-landing curve from 
the previous picture, we can then turn on the rocket engine and land safely. 

In the second case illustrated, we miss switching curve, and hence cannot land 
safely on the moon switching once. 
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h axis 




v axis 



To justify our guess about the structure of the optimal control, let us now find 
the costate p(-) so that a(-) and x(-) described above satisfy (ODE), (ADJ), (M). 
To do this, we will have have to figure out appropriate initial conditions 

p 1 (0) = A 1 , p 2 (0) = A 2 , p 3 (0) = X 3 . 

We solve (ADJ) for a(-) as above, and find 

(p 1 (t) = X 1 (0<t<r) 



P 2 (t) 
p 3 (t) 



A 2 - Ait 
A 3 



(0 < t < r) 
(0<t< t*] 



As + £ 



A 2 — \\s 



(m +k(t* —s))'- 



ds 



(t* <t< t). 



Define 



then 



r(t) := 1 - 



P 2 (t) 
m(t) 



■9 9 • \ 9 

— + ^ r +p 3 k = — + ±—{-ka) + 
m m z m m z 



p 2 a 



Ai 
m(t) 



Choose Ai < 0, so that r is decreasing. We calculate 

(A 2 - Air 



r(f) 



1 - 



m 



+ X 3 k 



r < on 



and then adjust A 2 , A3 so that r(t*) = 0. 

Then r is nonincreasing, r(t*) = 0, and consequently r > on [0, t* 
(t*,T). But (M) says 

J 1 if r(t) < 
W l0 ifr(t)>0. 
Thus an optimal control changes just once from to 1; and so our earlier guess of 
a(-) does indeed satisfy the Pontryagin Maximum Principle. □ 
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4.5 MAXIMUM PRINCIPLE WITH TRANSVERSALITY CONDITIONS 



Consider again the dynamics 



(ODE) 



x(t) = f(x(t),a(t)) 



(f >0) 



In this section we discuss another variant problem, one for which the initial 
position is constrained to lie in a given set Xq C M. n and the final position is also 
constrained to lie within a given set X\ C M n . 



So in this model we get to choose the starting point x G Xq in order to 
maximize 



Jo 

where r = t [<*(•)] is the first time we hit X\. 

NOTATION. We will assume that X ,Xi are in fact smooth surfaces in M. n . 
We let Tq denote the tangent plane to Xq at x , and T\ the tangent plane to X% at 



THEOREM 4.5 (MORE TRANSVERSALITY CONDITIONS). Leta*(-) 
andx*(-) solve the problem above, with 

x° = x*(0), x 1 =x*(r*). 

Then there exists a function p*(-) : [0, r*] — > MJ 1 , such that (ODE), (ADJ) and 
(M) hold for < t < r* . In addition, 

j P*( r *) is perpendicular to Tj, 
[ p*(0) is perpendicular to Tq. 




(P) 
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We call (T) the transversality conditions. 
REMARKS AND INTERPRETATIONS, (i) If we have T > fixed and 

rp 

P[a(-)]= [ r(x(t),a(t))dt + g(x(T)), 
Jo 

then (T) says 

p*(T) = V<7(x*(T)), 
in agreement with our earlier form of the terminal/transversality condition. 

(ii) Suppose that the surface X\ is the graph X\ = {x \ gk(x) = 0, k = 1, . . . , /}. 
Then (T) says that p*(r*) belongs to the "orthogonal complement" of the subspace 
T\. But orthogonal complement of T\ is the span of Vguix 1 ) (k = 1, . . . , /). Thus 

l 



fc=i 

for some unknown constants Ai, . . . , A;. □ 

4.6 MORE APPLICATIONS 

4.6.1 EXAMPLE 1: DISTANCE BETWEEN TWO SETS. As a first 
and simple example, let 

(ODE) = a(t) 

for A = S 1 , the unit sphere in R 2 : a G 5" 1 if and only if |a| 2 = a\ + a| = 1. In other 
words, we are considering only curves that move with unit speed. 
We take 



P[a(-)] = - [ \±(t)\ dt = - the length of the 
(P) Jo 



curve 



= - / dt = 
Jo 



time it takes to reach X\. 



We want to minimize the length of the curve and, as a check on our general 
theory, will prove that the minimum is of course a straight line. 

Using the maximum principle. We have 

H(x,p, a) = f(x, a) ■ p + r(x, a) 

= a ■ p - 1 = p\a\ + p 2 a 2 - 1. 

The adjoint dynamics equation (ADJ) says 

p(t) = -V a! ^(x(t),p(t),a(t)) = 0, 
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and therefore 

p(t) = constant = p° 7^ 0. 
The maximization principle (M) tells us that 

H(x(t),p(t),a(t)) = max[-l +p?a x + p%a 2 ]. 

aes 1 

The right hand side is maximized by a = > a un it vector that points in the same 
direction of p . Thus a(-) = o° is constant in time. According then to (ODE) we 
have x = a , and consequently x(-) is a straight line. 
Finally, the transversality conditions say that 

(T) p(0)±T , p(h)±T 1 . 

In other words, p° _L Tq and p° _L T\; and this means that the tangent planes Tq 
and T\ are parallel. 




Now all of this is pretty obvious from the picture, but it is reassuring that the 
general theory predicts the proper answer. □ 

4.6.2 EXAMPLE 2: COMMODITY TRADING. Next is a simple model 
for the trading of a commodity, say wheat. We let T be the fixed length of trading 
period, and introduce the variables 

= money on hand at time t 
x 2 (t) = amount of wheat owned at time t 
a (t) = rate of buying or selling of wheat 
q(t) = price of wheat at time t (known) 

A = cost of storing a unit amount of wheat for a unit of time. 
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We suppose that the price of wheat q(t) is known for the entire trading period 
< t < T (although this is probably unrealistic in practice). We assume also that 
the rate of selling and buying is constrained: 

\a(t)\ < M, 

where a(t) > means buying wheat, and a(t) < means selling. 

Our intention is to maximize our holdings at the end time T, namely the sum 
of the cash on hand and the value of the wheat we then own: 

(P) P{cx(.)]=x 1 (T)+q(T)x 2 (T). 

The evolution is 

f x 1 (t) = -Xx 2 (t)-q(t)a(t) 
[ ] \ x 2 (t) = a(t). 

This is a nonautonomous (= time dependent) case, but it turns out that the Pon- 
tryagin Maximum Principle still applies. 

Using the maximum principle. What is our optimal buying and selling 
strategy? First, we compute the Hamiltonian 

H(x,p, t,a) = f • p + r = pi(-\x 2 - q(t)a) + p 2 a, 

since r = 0. The adjoint dynamics read 

f p 1 = 

(ADJ) \ P 2 
with the terminal condition 

(T) p(T) = Vg(x(T)). 

In our case g(x\,X2) = x\ + q(T)x2, and hence 

(T) ( " 1(T) = 1 

We then can solve for the costate: 

p\t) = l 



p 2 (t) = X(t-T)+q(T). 
The maximization principle (M) tells us that 

H(x(t),p(t),t,a(t)) = max{p 1 (t)(-Xx 2 (t) - q(t)a) + p 2 (t)a} 

, ^ \a\<M 

= -Xp 1 (t)x 2 (t) + max {a(-q(t) +p 2 (t))}. 

\a\<M 
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So 

M if q(t)<p 2 (t) 

a(t) 



-M if q(t)>p 2 (t) 
for p 2 (t) := X(t-T) + q(T). 

CRITIQUE. In some situations the amount of money on hand x 1 ^) becomes 
negative for part of the time. The economic problem has a natural constraint xi > 
(unless we can borrow with no interest charges) which we did not take into account 
in the mathematical model. □ 

4.7 MAXIMUM PRINCIPLE WITH STATE CONSTRAINTS 

We return once again to our usual setting: 

A(*) = f(x(*),a(*)) 



(0DE) ^(o) = A 

(P) P[a(-)]= [ T r(x(t),cx(t))dt 

Jo 

for r = r[a(-)], the first time that x(r) = x 1 . This is the fixed endpoint problem. 

STATE CONSTRAINTS. We introduce a new complication by asking that 
our dynamics x(-) must always remain within a given region R C R n . We will as 
above suppose that R has the explicit representation 

R = {x G R n I g(x) < 0} 

for a given function g(-) : R n — > M. 

DEFINITION. It will be convenient to introduce the quantity 

c(x, a) := Vg(x) ■ f(x, a). 

Notice that 

if x(t) G dR for times s < t < si, then c(x(t), a(t)) = (s < t < s\). 
This is so since f is then tangent to dR, whereas Vg is perpendicular. 

THEOREM 4.6 (MAXIMUM PRINCIPLE FOR STATE CONSTRAINTS). 

a*(-),x*(-) solve the control theory problem above. Suppose also that x*(i) G dR 
for sq < t < si . 

Then there exists a co state function p*(-) : [so, si] — > lR n such that (ODE) holds. 
There also exists A*(-) : [so, si] — > R such that for times so < t < s\ we have 

(ADJ') p*(t) = -V x H(^(t),p*(t), a *(t)) + X*(t)V x c(^(t), a *(t)y, 
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and 

(W) H(x*(t),p*(t),a*(t)) = max{# (x*(t), p*(t), a) | c(x*(t), a) = 0}. 

To keep things simple, we have omitted some technical assumptions really 
needed for the Theorem to be valid. 

REMARKS AND INTERPRETATIONS (i) Let A C M m be of this form: 

A = {a E R m | gi(a) < 0, . . .,g s (a) < 0} 

for given functions <7i, . . . ,g 8 : M m — > R. In this case we can use Lagrange multipliers 
to deduce from (M') that 

s 

(M") V aJ ff(x*(t),p*(t),a*(t)) = A*(t)V a c(x*(t),a*(t)) + ^^(t)V a ^(x*(t)). 

i=l 

The function A*(-) here is that appearing in (AD J'). 

If x*(t) lies in the interior of R for say the times < t < sq, then the ordinary 
Maximum Principle holds. 

(ii) Jump conditions. In the situation above, we always have 

p*(so-0) = p*(s + 0), 

where sq is a time that x* hits dR. In other words, there is no jump in p* when 
we hit the boundary of the constraint dR. 
However, 

p*(si + 0) = p*(si - 0) - A*(si)V<7(x*(si)); 
this says there is (possibly) a jump in p*(-) when we leave dR. □ 

4.8 MORE APPLICATIONS 

4.8.1 EXAMPLE 1: SHORTEST DISTANCE BETWEEN TWO POINTS, 
AVOIDING AN OBSTACLE. 
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What is the shortest path between two points that avoids the disk B = B(0, r), 
as drawn? 

Let us take 

(0DE) { x(0) = J 

for A = S 1 , with the payoff 

(P) P[a(-)] = — \i\dt = —length of the curve x(-). 

Jo 

We have 

H(x,p, a) = f -p + r = piai +p 2 fl2 - 1- 
Case 1: avoiding the obstacle. Assume x(t) ^ on some time interval. 
In this case, the usual Pontryagin Maximum Principle applies, and we deduce as 
before that 

p = -V X H = 0. 

Hence 

(AD J) p(£) = constant = p°. 

Condition (M) says 

H(x(t),p(t),a(t)) = max(-l +p?ai +p 2 a 2 ). 

aES 1 

The maximum occurs for a = -rm. Furthermore, 

-1 + p iai + p° 2 a 2 = 0; 

and therefore a ■ p° = 1. This means that = 1, and hence in fact a = p°. We 
have proved that the trajectory x(-) is a straight line away from the obstacle. 

Case 2: touching the obstacle. Suppose now x(£) e dB for some time 
interval sq < t < s\. Now we use the modified version of Maximum Principle, 
provided by Theorem 4.6. 

First we must calculate c(x, a) = Vg(x) ■ f(x, a). In our case, 

R = M 2 - B = {x | x\ + x\ > r 2 } = {x | g := r 2 - x\ - x\ < 0}. 
Then Vg= ^ • Since f = (^j , we have 

c(x, a) = — 2a\X\ — 2a 2 x 2 . 

Now condition (AD J') implies 

p(t) = -V x H + X(t)V x c; 
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which is to say, 

(4 ' 6) { f = -2A^. 

Next, we employ the maximization principle (M'). We need to maximize 

#(x(£),p(t),a) 

subject to the requirements that c(x(t),a) = and (71(a) = a\ + a 2 , — 1 = 0, since 
A = {a e R 2 I a\ + a\ = 1}. According to (M") we must solve 

V a H = \(t)V a c + fi{t)V agi ; 

that is, 

p 1 = A(-2x 1 ) +^2a 1 
p 2 = A(-2x 2 ) + /j2a 2 . 
We can combine these identities to eliminate \i. Since we also know that x(t) e 
we have (a; 1 ) 2 + (x 2 ) 2 = r 2 ; and also a = (q; 1 ,^ 2 ) 71 is tangent to dB. Using these 
facts, we find after some calculations that 

(4.7) X = V^~V^\ 
But we also know 

(4.8) (a 1 ) 2 + (a 2 ) 2 = l 



and 



H = = -1 +p 1 a 1 +p 2 a 2 ; 



hence 

(4.9) pV+pVEl. 

Solving for the unknowns. We now have the five equations (4.6) — (4.9) 
for the five unknown functions p 1 ,^ 2 , a 1 , a 2 , A that depend on t. We introduce the 
angle 6>, as illustrated, and note that ^ = r^. A calculation then confirms that 
the solutions are 

a 1 (0) = -sm9 
a 2 (9) = cos9, 

. __ k + e 

and 

p 1 (9) = k cos e — sin 9 + 9 cos 9 
p 2 (9) = k sin 9 + cos 9 + 9 sin 9 
for some constant k. 
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Case 3: approaching and leaving the obstacle. In general, we must piece 
together the results from Case 1 and Case 2. So suppose now x(i) 6 R = M 2 — B 
for < t < s and x(i) G <9-B for s < i < sx- 

We have shown that for times < t < so, the trajectory x(-) is a straight line. 
For this case we have shown already that p = a and therefore 

p 1 = — COS (j)o 
p 2 = sin 0o, 

for the angle <po as shown in the picture. 

By the jump conditions, p(-) is continuous when x(-) hits dB at the time sq, 
meaning in this case that 

k cos 6*o — sin 9q + 9q cos 6q = — cos <pQ 
k sin 6>o + cos 9q + 9q sin 9q — sin <po. 

These identities hold if and only if 

k = -9 
9 + (po = %- 

The second equality says that the optimal trajectory is tangent to the disk B when 
it hits dB. 
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We turn next to the trajectory as it leaves dB: see the next picture. We then 
have 

f p 1 (9i ) = -0 O cos 0i - sin 0i + Q x cos Q x 

\ p 2 (0- ) = -0 O sin 0i + cos 0i + 0i sin 0i . 
Now our formulas above for A and k imply A(0i) = 6>0 2 ^ 6>1 . The jump conditions 
give 



for g(x) = 




Therefore 

p 1 (9t) = - sin 0i 



p 2 (0+) = cos 0i, 

and so the trajectory is tangent to dB. If we apply usual Maximum Principle after 
x(-) leaves B, we find 

p 1 = constant = — cos 4>i 
p 2 = constant = — sin</>i. 



Thus 



and so 0i + 0i = n. 



— cos 0i = — sin 0i 

— sin 0i = cos 0i 



CRITIQUE. We have carried out elaborate calculations to derive some pretty 
obvious conclusions in this example. It is best to think of this as a confirmation in 
a simple case of Theorem 4.6, which applies in far more complicated situations. 

□ 
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4.8.2 AN INVENTORY CONTROL MODEL. Now we turn to a simple 
model for ordering and storing items in a warehouse. Let the time period T > be 
given, and introduce the variables 



x(t) = amount of inventory at time t 

a(t) = rate of ordering from manufacturers, a > 0, 

d(t) = customer demand (known) 

7 = cost of ordering 1 unit 

(3 = cost of storing 1 unit. 



Our goal is to fill all customer orders shipped from our warehouse, while keeping 
our storage and ordering costs at a minimum. Hence the payoff to be maximized is 



Jo 

We have A = [0, oo) and the constraint that x(t) > 0. The dynamics are 



Guessing the optimal strategy. Let us just guess the optimal control strat- 
egy: we should at first not order anything (a = 0) and let the inventory in our 
warehouse fall off to zero as we fill demands; thereafter we should order just enough 
to meet our demands (a = d). 



(P) 




(ODE) 




x axis 



x 



v 





t axis 



Using the maximum principle. We will prove this guess is right, using the 
Maximum Principle. Assume first that x(t) > on some interval [0, so]. We then 
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have 

H(x, p, a, t) = (a — d(t))p — 7a — 
and (ADJ) says p = —V X H = f3. Condition (M) implies 

H(x(t),p(t),a(t),t) = max{-<ya - /3x(t) + p(t)(a - d(t))} 

a>0 

= -px(t) -p(t)d(t) + max{a(p(t) -7)}. 

a>0 

Thus 

ro ifp(t)<7 

ck(£J = < 

[ +00 if p(t) > 7. 

If ct(t) = +00 on some interval, then P[ct(-)] = —00, which is impossible, because 
there exists a control with finite payoff. So it follows that a(-) = on [0, so]: we 
place no orders. 

According to (ODE), we have 

x(t) = -d(t) (0<t< s ) 
x(0) = x°. 

Thus s is first time the inventory hits 0. Now since x(t) = x° — J * d(s) ds, we 
have x(so) = 0. That is, J S ° d(s) ds = x° and we have hit the constraint. Now use 
Pontryagin Maximum Principle with state constraint for times t > so 

R = {x > 0} = {g(x) := -x < 0} 

and 

c(x, a, t) = Vg(x) ■ f(x, a, t) = (— l)(a — d(t)) = d(t) — a. 

We have 

(M') H(x(t),p(t),a(t),t) = max{H(x(t),p(t),a,t) \ c(x(t),a,t) = 0}. 

a>0 

But c(x(t),a(t), t) = if and only if a(t) = d(t). Then (ODE) reads 

x(t) = a(t) - d(t) = 

and so x(i) = for all times t > s . 

We have confirmed that our guess for the optimal strategy was right. □ 



4.9 REFERENCES 

We have mostly followed the books of Fleming-Rishel [F-R], Macki-Strauss 
[M-S] and Knowles [K]. 

Classic references are Pontryagin-Boltyanski-Gamkrelidze-Mishchenko [P-B- 
G-M] and Lee-Markus [L-M]. Clarke's booklet [C2] is a nicely written introduction 
to a more modern viewpoint, based upon "nonsmooth analysis" . 
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CHAPTER 5: DYNAMIC PROGRAMMING 



5.1 Derivation of Bellman's PDE 

5.2 Examples 

5.3 Relationship with Pontryagin Maximum Principle 

5.4 References 



5.1 DERIVATION OF BELLMAN'S PDE 

5.1.1 DYNAMIC PROGRAMMING. We begin with some mathematical wis- 
dom: "It is sometimes easier to solve a problem by embedding it within a larger 
class of problems and then solving the larger class all at once." 

A CALCULUS EXAMPLE. Suppose we wish to calculate the value of the 
integral 



sin a; 
ax. 



'o x 

This is pretty hard to do directly, so let us as follows add a parameter a into the 
integral: 

1(a) : = fe-^^. 
Jo x 

We compute 

1 



r°° sin r f 

I'(a)= {-x)e- ax —-dx = - smxe- ax dx = 
Jo x Jo 



a 2 + l' 

where we integrated by parts twice to find the last equality. Consequently 

1(a) = — arctan a + C, 
and we must compute the constant C. To do so, observe that 
= /(oo) = — arctan(oo) + C = — — + C, 
and so C = §. Hence 1(a) = — arctan a + § , and consequently 



/ dx = 1(0) 

Jo x 



smx , w „ s 7T 
2" 

□ 



We want to adapt some version of this idea to the vastly more complicated 
setting of control theory. For this, fix a terminal time T > and then look at the 
controlled dynamics 

x(s) = f(x(s),a(s)) (0<s<T) 
x(0) = x°, 
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(ODE) 



with the associated payoff functional 

(P) P[a(-)]=/ r(x(s),cx( S ))ds + g(x(T)). 

Jo 

We embed this into a larger family of similar problems, by varying the starting 
times and starting points: 

f x( S ) = f(x( S ),a( S )) (t<s<T) 
(5J) \x(t)=x. 

with 



(5.2) P x , t [a{-)] = j r(x(s),a(s))ds + ^(x(T)). 

Consider the above problems for all choices of starting times < £ < T and all 
initial points i6K". 

DEFINITION. For x G 1", < t < T, define the value function v(x,t) to be 
the greatest payoff possible if we start at x G W 1 at time t. In other words, 

(5.3) v(x,t):= sup P xt [a(-)] (x G R n , < t < T). 

a(-)€.A 

Notice then that 

(5.4) v(x,T) = g(x) (x G M n ). 

5.1.2 DERIVATION OF HAMILTON- J ACOBI-BELLM AN EQUATION. 

Our first task is to show that the value function v satisfies a certain nonlinear partial 
differential equation. 

Our derivation will be based upon the reasonable principle that "it's better to 
be smart from the beginning, than to be stupid for a time and then become smart" . 
We want to convert this philosophy of life into mathematics. 

To simplify, we hereafter suppose that the set A of control parameter values is 
compact. 

THEOREM 5.1 (HAMILTON- JACOBI-BELLMAN EQUATION). Assume 
that the value function v is a C 1 function of the variables (x,t). Then v solves the 
nonlinear partial differential equation 

(HJB) v t (x,t) + max{f(x,a)-V x v(x,t) + r(x,a)} = (x G IF, < t < T), 

aeA 

with the terminal condition 



v(x,T) = g(x) (x G M n ). 
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REMARK. We call (HJB) the Hamilton- J acobi-Bellman equation, and can rewrite 
it as 

(HJB) v t (x,t) + H(x,W x v) = (x G K n ,0 < t < T), 

for the partial differential equations Hamiltonian 

H(x,p) := maxi7(x,p, a) = max{f(x, a) ■ p + r(x, a)} 

where x,p eW 1 . □ 
Proof. 1. Let x e M n , < t < T and let h > be given. As always 
A = {&(■) : [0, oo) — > A measurable}. 
Pick any parameter aEA and use the constant control 

£*(•) = a 

for times t < s < t + h. The dynamics then arrive at the point x(t + /i), where 
t + h < T. Suppose now a time t + h, we switch to an optimal control and use it 
for the remaining times t + h < s < T. 

What is the payoff of this procedure? Now for times t < s < t + h, we have 

x(s) = f(x(s),a) 
x(t) = x. 

The payoff for this time period is f^ +h r(x(s), a) ds. Furthermore, the payoff in- 
curred from time t + h to T is v (x(t + /i), £ + /i), according to the definition of the 
payoff function v. Hence the total payoff is 

r(x(s), o) ds + v (x(t + h) , t + h) . 



But the greatest possible payoff if we start from (x, t) is v(x,f). Therefore 

rt+h 

(5.5) v(x,t)> J r(x(s),a)ds + v(it(t + h),t + h). 

2. We now want to convert this inequality into a differential form. So we 
rearrange (5.5) and divide by h > 0: 

v(x(t + h),t + h)-v(x,t) 1 

+ — / r(x(s), o) as < 0. 



Let ft -> 0: 

u t (x, t) + V x z;(x(t), t) • x(t) + r(x(£), a) < 0. 
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But x(-) solves the ODE 

x(s) = f(x(s),a) (£<s + 
x(i) = X. 

Employ this above, to discover: 

v t (x, t) + f(x, a) ■ V x v(x, t) + r(x, a) < 0. 
This inequality holds for all control parameters a G A, and consequently 
(5.6) max {vt(x, t) + f(x, a) ■ J \/ X v(x, t) + r(x, a)} < 0. 

3. We next demonstrate that in fact the maximum above equals zero. To see 
this, suppose <**(•)) x*(-) were optimal for the problem above. Let us utilize the 
optimal control <**(•) for t < s < t + h. The payoff is 

t+h 

r(x*(s),a*(s))ds 



and the remaining payoff is v(x*(t + h),t + h). Consequently, the total payoff is 

rt+h 

J r(x*(s),a*(s))ds + v(x.*(t + h),t + h) = v(x,t). 
Rearrange and divide by h: 

+ ft). ' + ft) -»(»■') + i £ + " r(x . (s) , a . (s)) ds = o. 

Let h — > and suppose a*(t) = a* e A. Then 

v t (x, t) + V x v(x, t) ■ k*(t) + r(x, a*) = 0; 

f(x,a*) 

and therefore 

vt(x, t) + f(x, a*) ■ V x v(x, t) + r(x, a*) = 
for some parameter value a* G A. This proves (HJB). □ 



5.1.3 THE DYNAMIC PROGRAMMING METHOD 

Here is how to use the dynamic programming method to design optimal controls: 

Step 1: Solve the Hamilton-Jacobi-Bellman equation, and thereby compute 
the value function v. 
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Step 2: Use the value function v and the Hamilton-Jacobi-Bellman PDE to 
design an optimal feedback control £**(•), as follows. Define for each point x G M n 
and each time < t < T, 

oc(x, t) = a G A 

to be a parameter value where the maximum in (HJB) is attained. In other words, 
we select a(x,t) so that 

Vt(x, t) + f(x, a(x, t)) ■ V x v(x, t) + r(x, oc(x, t)) = 0. 

Next we solve the following ODE, assuming £*(•,£) is sufficiently regular to let us 
do so: 



(ODE) 



x*(s) = f(x*(s), a(x*(s), s)) (t < s <T) 
x(t) = 

Finally, define the feedback control 

(5.7) a*(s) := a(x*(s),s). 

In summary, we design the optimal control this way: If the state of system is x 
at time t, use the control which at time t takes on the parameter value a G A such 
that the minimum in (HJB) is obtained. 

We demonstrate next that this construction does indeed provide us with an 
optimal control. 

THEOREM 5.2 (VERIFICATION OF OPTIMALITY). The control a* (•) 
defined by the construction (5.7) is optimal. 

Proof. We have 

P x ,t[o?{-)] = f r(x*(a), «*(*)) d* + <7(x*(T)). 
Furthermore according to the definition (5.7) of <*(•): 
^,tK(-)] = ^ T (-^(x*(s), S )-f(x*( S ),«*( S ))-V^(x*( S ), S )) ds + g(x*(T)) 

= ~ J ut(x*(s), s) + V^(x*(s), s) • x*(s) cfe + <7(x*(T)) 

= -| — V (x*( S ), S )d S + ^(x*(T)) 

= -t/(x*(T),T) + v(x*(t),t) +^(x*(T)) 
= -g(x*(T))+v(x*(t) 1 t)+g(x*(T)) 



= sup P x ,t[a(-)]- 

a(-)6-4 
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That is 



(ODE) 



P x ,t[a*{-)]= sup P x , t [a{-)]; 
and so «*(•) is optimal, as asserted. □ 

5.2 EXAMPLES 

5.2.1 EXAMPLE 1: DYNAMICS WITH THREE VELOCITIES. Let 

us begin with a fairly easy problem: 

x(s) = a(s) (0<t<s<l) 
x(t) = X 

where our set of control parameters is 

A = {-1,0,1}. 

We want to minimize 

l 

\x(s) \ ds, 

t 

and so take for our payoff functional 

(P) Px,M')]=-J \x(s)\d8. 

As our first illustration of dynamic programming, we will compute the value 
function v(x,t) and confirm that it does indeed solve the appropriate Hamilton- 
Jacobi-Bellman equation. To do this, we first introduce the three regions: 



t=1 



x = t-1 



x =J-t 



t=0 



• Region I = {(x, t) \ x < t - 1, < t < 1}. 

• Region II = {(x, t) \ t - 1 < x < 1 - t, < t < 1}. 

• Region III = {(x, t) \ x > 1 - t, < t < 1}. 

We will consider the three to which region the initial data (x, t) lie 

within. 
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x axis 




0,x-1+t) 



t=1 



t axis 



Optimal path in Region III 

Region III. In this case we should take a = — 1, to steer as close to the origin 
as quickly as possible. (See the next picture.) Then 

v(x,t) = — area under path taken = — (l — t)-(x+x+t—l) = — — - — -(2x+t — 1). 

Region I. In this region, we should take a = 1, in which case we can similarly 
compute v(x, t) = — (^p) (— 2x + t — 1). 



x axis 




Optimal path in Region II 



Region II. In this region we take a = ±1, until we hit the origin, after which 

2 

we take a = 0. We therefore calculate that v(x,t) = — \ in this region. 

78 



Checking the Hamilton-Jacobi-Bellman PDE. Now the Hamilton- Jacobi- 
Bellman equation for our problem reads 



(5.8) v t + max{/ • V x v + r} = 

a6 A 

for / = a, r = —\x\. We rewrite this as 

vt + max {av x } — \x\ = 0; 

a=±l,0 

and so 

(HJB) u* + - |x| = 0. 

We must check that the value function v, defined explicitly above in Regions I-III, 
does in fact solve this PDE, with the terminal condition that v(x, 1) = g(x) = 0. 

2 

Now in Region II v = — v t = 0, v x = —x. Hence 

vt + \v x \ — \x\ = + | — x\ — \x\ = in Region II, 

in accordance with (HJB). 
In Region III we have 

/ \ (1-*)/ 
v(x,t) = -^-^(2x + t-l); 

and therefore 

Vt = l(2x + t - 1) - - ~ - = x - 1 + t, w x =t-l, |t — lj = 1 — t > 0. 
Hence 

ft + \v x \ — \x\ = x — 1 + t + \t — 1| — \x\ = in Region III, 

because x > there. 

Likewise the Hamilton-Jacobi-Bellman PDE holds in Region I. 

REMARKS, (i) In the example, v is not continuously differentiable on the bor- 
derlines between Regions II and I or III. 

(ii) In general, it may not be possible actually to find the optimal feedback 
control. For example, reconsider the above problem, but now with A = {—1,1}. 
We still have a = sgn(w x ), but there is no optimal control in Region II. □ 



5.2.2 EXAMPLE 2: ROCKET RAILROAD CAR. Recall that the equa- 
tions of motion in this model are 

xx\ f l\ fxi\ f0\ ii/i 

+ ] a, \a\ < 1 



xi ) \ / \ X2 J \ 1 
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and 

P[a(-)] = - time to reach (0, 0) = - / ldt = -r. 

Jo 

To use the method of dynamic programming, we define v(xi,x 2 ) to be minus 
the least time it takes to get to the origin (0, 0), given we start at the point (x±, x 2 ). 

What is the Hamilton-Jacobi-Bellman equation? Note v does not depend on t, 
and so we have 



for 

Hence our PDE reads 

and consequently 
(HJB) 



maxjf • V x v + r] = 

aEA 



A= [-1,1], f = ( ? ), r = -l 



max {x 2 v Xl + av X2 - 1} = 0; 

|a|<l 



X2V Xl + \v X2 \ -1 = in 
v(0,0) = 0. 



Checking the Hamilton-Jacobi-Bellman PDE. We now confirm that v 
really satisfies (HJB). For this, define the regions 

I :={(x 1 ,x 2 )\x 1 > ~X2\x 2 \} and II := {(xi, x 2 ) \ x x < -^x 2 \x 2 \}. 

A direct computation, the details of which we omit, reveals that 

— x 2 — 2 (x\ + \x^) 2 in Region I 

x 2 — 2 (— x\ + \x^) 2 in Region II. 

In Region I we compute 

(x 2 \ ~^ 
Xl + — ) x 2 , 

( 4 

Vx 1 =-\Xi + — 



v(x) 



2 

and therefore 

( xl\~ h 

x 2 v Xl + \v X2 1 - 1 = -x 2 [x 1 + — + l + x 2 [x 1 + 



x 2 



-1 = 0. 



This confirms that our (HJB) equation holds in Region I, and a similar calculation 
holds in Region II. 
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Optimal control. Since 

max {x 2 v Xl + av X2 + 1} = 0, 

|o|<l 

the optimal control is 

a = sgnv X2 . 

□ 

5.2.3 EXAMPLE 3: GENERAL LINEAR- QUADRATIC REGULA- 
TOR 

For this important problem, we are given matrices M,B,D e M nXn , N e 
M nxm , C E M mxm ; and assume 

B,C,D are symmetric and nonnegative definite, 

and 

C is invertible. 

We take the linear dynamics 

x(s) = Mx(s) + Not{s) (t<s<T) 
x(t) = X, 

for which we want to minimize the quadratic cost functional 



(ODE) 



y x(s) T Sx(s) + a(s) T Ca(s) ds + x(T) T Dx(T). 



So we must maximize the payoff 

rp 

(P) Px,t[ a (-)] = ~ J x(s) T Bx(s)+a(s) T Ca(s)ds-x(T) T Dx(T). 

The control values are unconstrained, meaning that the control parameter values 
can range over all of A = R m . 

We will solve by dynamic programming the problem of designing an optimal 
control. To carry out this plan, we first compute the Hamilton-Jacobi-Bellman 
equation 

v t + max {f • V x v + r} = 0, 

where 

' f = Mx + Na 
r = —x T Bx — a T Ca 
g = —x T Dx. 
Rewrite: 

(HJB) v t + maxUVvfNa - a T Ca} + (Vv) T Mx - x T Bx = 0. 

aeM m 

81 



We also have the terminal condition 

v(x, T) = —x T Dx 

Minimization. For what value of the control parameter a is the minimum 

attained? To understand this, we define Q(a) := (Vv) T Na — a T Ca, and determine 
where Q has a minimum by computing the partial derivatives Q aj for j = 1, . . . , m 
and setting them equal to 0. This gives the identitites 



Qaj — ^ ^ Vxj fT'ij ^CliCij — 0. 



i=l 

Therefore (Vv) T N = 2a T C, and then 2C T a = N T Vv. But C T = C. Therefore 

a=^C~ 1 N T V x v. 

This is the formula for the optimal feedback control: It will be very useful once we 
compute the value function v. 

Finding the value function. We insert our formula a = \C~ X N T Vv into 
(HJB), and this PDE then reads 



(HJB) 



v t + KVv^NC^^Vv + (Vv) T Mx - x T Bx = 
v(x,T) = —xDx. 
Our next move is to guess the form of the solution, namely 

v(x, t) = x T K{t)x, 

provided the symmetric n x n-matrix valued function K(-) is properly selected. Will 
this guess work? 

Now, since —x T K(T)x = —v(x,T) = x T Dx, we must have the terminal condi- 
tion that 

K{T) = -D. 

Next, compute that 

v t = x T K(t)x, V x v = 2K(t)x. 
We insert our guess v = x T K(t)x into (HJB), and discover that 

x T {K(t) + K(t)NC~ 1 N T K(t) + 2K(t)M - B}x = 0. 

Look at the expression 

2x T KMx = x T KMx + [x T KMx] T 
= x T KMx + x T M T Kx. 

Then 

x T {K + KNC~ 1 N T K + KM + M T K - B}x = 0. 
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(R) 



This identity will hold if K(-) satisfies the matrix Riccati equation 

K(t) + K {t)N C~ x N T K it) + K(t)M + M T K(t) -B = (0 < t < T) 
K{T) = -D 

In summary, if we can solve the Riccati equation (R), we can construct an 
optimal feedback control 

a*{t) =C- 1 N T K(t)x(t) 

Furthermore, (R) in fact does have a solution, as explained for instance in the book 
of Fleming-Rishel [F-R]. 

5.3 DYNAMIC PROGRAMMING AND THE PONTRYAGIN MAXI- 
MUM PRINCIPLE 



5.3.1 THE METHOD OF CHARACTERISTICS. 

Assume H : W 1 x M n — > R and consider this initial-value problem for the 

Hamilton-, J acobi equation: 

u t (x, t) + H(x, V x u(x, t)) = (x e M n , < t < T) 
u(x,0) = g(x). 



(HJ) 



A basic idea in PDE theory is to introduce some ordinary differential equations, 
the solution of which lets us compute the solution u. In particular, we want to find 
a curve x(-) along which we can, in principle at least, compute u(x, t). 

This section discusses this method of characteristics, to make clearer the con- 
nections between PDE theory and the Pontryagin Maximum Principle. 



NOTATION. 



/x\t)\ 



x(t) 



P (£) = Va.u(x(*),t) 



Derivation of characteristic equations. We have 



/p\t)\ 



\P n (t)J 



and therefore 



p k (t)=u Xk (x(t),t), 



P h (t) =u Xkt (x(t),t) +^w XfcXi (x(t),t) -x\ 

1=1 



Now suppose u solves (HJ). We differentiate this PDE with respect to the variable 
x k - 

n 

u tXk (x, t) = -H Xk (x, Wu(x, t))~Y^ H Pi ( x > ^u{x, t)) ■ u XkXi (x, t). 



1=1 
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Let x = x(t) and substitute above: 

n 

p k (t) = -ffs fc (x(t), VXx(^ ) + ^ 

p(t) ' 1 p(t) 

We can simplify this expression if we select x(-) so that 

x l (t)=H Pi (x(t),p(t)), (l<i<n); 

then 

p fc (t) = -if Xfe (x(t),p(t)), (l<k<n). 
These are Hamilton's equations, already discussed in a different context in §4.1: 

x(t) = V p #(x(£),p(£)) 
p(t) = -V^(x(t), P (t)). 



(H) 



We next demonstrate that if we can solve (H), then this gives a solution to PDE 
(HJ), satisfying the initial conditions u = g on t = 0. Set p° = Vg(x°). We solve 
(H), with x(0) = x° and p(0) = p°. Next, let us calculate 

j t «t), t) = u t (x(t), t) + W x u(pc(t),t) • x(t) 

= -H(V x u(x(t), t), x(t)) + V a u(x(t), t) • V p U(x(t), p(t)) 

N v ' N v ' 

p(t) p(t) 

= -#(x(t), p(t)) + p(t) • V p H(x(t), p(t)) 

Note also w(x(0),0) = -u(;r ,0) = Integrate, to compute u along the curve 

x(-): 

u(x(t),t)= / + V P H -pds + g(x°). 
Jo 

This gives us the solution, once we have calculated x(-) and p(-). 

5.3.2 CONNECTIONS BETWEEN DYNAMIC PROGRAMMING AND 
THE PONTRYAGIN MAXIMUM PRINCIPLE. 

Return now to our usual control theory problem, with dynamics 

fQDE v r x( S ) = f(x( S ),a( S )) (t<s<T) 

[ ' I x(£) = x 

and payoff 

(P) Px,t[<*(-)] = f r(x(s),cx( S ))d S + g(x(T)). 



As above, the value function is 



v(x,t) = sup P Xjt [«(•)]• 
«(■) 
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The next theorem demonstrates that the costate in the Pontryagin Maximum 
Principle is in fact the gradient in x of the value function v, taken along an optimal 
trajectory: 

THEOREM 5.3 (COSTATES AND GRADIENTS). Assume «**(•), x*(-) 
solve the control problem (ODE), (P). 

If the value function v is C 2 , then the costate p*(-) occuring in the Maximum 
Principle is given by 

p*(s) = V x v(x*(s),s) (t<s<T). 

Proof. 1. As usual, suppress the superscript *. Define p(t) := V x v(x.(t),t). 
We claim that p(-) satisfies conditions (ADJ) and (M) of the Pontryagin Max- 
imum Principle. To confirm this assertion, look at 

d n 
f(t) = ^v Xi (x(t),t) = v Xit (pc(t),t) + ^v XiXj (x(t),t)x i (t). 

We know v solves 

Vt(x, t) + max{f (x, a) ■ V x v(x, t) + r(x, a)} = 0; 

a£ A 

and, applying the optimal control <*(•), we find: 

v t (x(t),t) + f (x(t), a(t)) ■ V x v(x(t),t) + r(x(t), a(t)) = 0. 

2. Now freeze the time t and define the function 

h(x) := vt(x, t) + f(x, at(t)) ■ V x v(x, t) + r(x, oc(t)) < 0. 

Observe that h(x.(t)) = 0. Consequently h(-) has a maximum at the point x = x(t); 
and therefore for % = 1, . . . , n, 

= h Xi (x(t) ) = v tXi (x(t) , t) + f Xi (x(t) ,a(t))- V x v(pc(t) , t) 

+ f(x(t), a(t)) ■ V x v Xi (x(t),t) + r Xi (x(t),a(t)). 

Substitute above: 

n 

P(t) = v x . t ■ ^ '•, , , /, = v Xit + f • V x v Xi = - f x . ■ W x v - r Xi . 

i=l 

Recalling that p(t) = V x v(x.(t),t), we deduce that 

t>(t) = -(V x f)p- V x r. 

Recall also 

H = f-p + r, V X H = (VJ)p + V x r. 
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Hence 

p(t) = -V x H(p(t),x(t)), 

which is (AD J). 

3. Now we must check condition (M). According to (HJB), 

u*(x(t), t) + max{f (x(t), a) ■ Vu(x(t), t) + r(x(£), £)} = 0, 

and maximum occurs for a = oc(t). Hence 

max{#(x(£), p(t), a)} = #(x(£), p(t),a(t)); 

aEA 

and this is assertion (M) of the Maximum Principle. □ 

INTERPRETATIONS. The foregoing provides us with another way to look 
at transversality conditions: 

(i) Free endpoint problem: Recall that we stated earlier in Theorem 4.4 
that for the free endpoint problem we have the condition 

(T) p*(T)=Vs(x*(T)) 

for the payoff functional 

J r(x(s),a(a))ds + 0(x(T)). 

To understand this better, note p*(s) = — Vi>(x*(s), s). But v(x,t) = g(x), and 
hence the foregoing implies 

p*(T) = V^(x*(T),T) = V<7(x*(T)). 

(ii) Constrained initial and target sets: 

Recall that for this problem we stated in Theorem 4.5 the transversality condi- 
tions that 



(T) 



p*(0) is perpendicular to T 
p*(r*) is perpendicular to T\ 



when r* denotes the first time the optimal trajectory hits the target set X\. 
Now let v be the value function for this problem: 

v(x) = sup P x [«(•)], 

«(■) 
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with the constraint that we start at x° G X and end at x 1 G X\ But then v will 
be constant on the set X and also constant on X±. Since Vv is perpendicular to 
any level surface, Vv is therefore perpendicular to both dX and dXi. And since 



this means that 



p*(f) = Vt/(x*(f)), 

p* is perpendicular to dX at t = 0, 
p* is perpendicular to dX\ at t = t*. 



□ 



5.4 REFERENCES 

See the book [B-CD] by M. Bardi and I. Capuzzo-Dolcetta for more about 
the modern theory of PDE methods in dynamic programming. Barron and Jensen 
present in [B-J] a proof of Theorem 5.3 that does not require v to be C 2 . 
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CHAPTER 6: DIFFERENTIAL GAMES 



6.1 Definitions 

6.2 Dynamic Programming 

6.3 Games and the Pontryagin Maximum Principle 

6.4 Application: war of attrition and attack 

6.5 References 

6.1 DEFINITIONS 

We introduce in this section a model for a two-person, zero-sum differential 
game. The basic idea is that two players control the dynamics of some evolving 
system, and one tries to maximize, the other to minimize, a payoff functional that 
depends upon the trajectory. 

What are optimal strategies for each player? This is a very tricky question, 
primarily since at each moment of time, each player's control decisions will depend 
upon what the other has done previously. 

A MODEL PROBLEM. Let a time < t < T be given, along with sets A C R m , 
BCR ! and a function f : R n x A x B -> R n . 

DEFINITION. A measurable mapping ct(-) : [t, T] — > A is a control for player 
I (starting at time t). A measurable mapping (3(-) : [£, T] — > B is a control for player 



Player /, whose control is a(-), wants to maximize the payoff functional P[-]. 
Player iY has the control (3(-) and wants to minimize P[-]. This is a two-person, 
zero-sum differential game. 



II. 



(ODE) 




the initial point x G M 71 being given. 



DEFINITION. The payoff of the game is 



(P) 




We intend now to define value functions and to study the game using dynamic 
programming. 
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DEFINITION. The sets of controls for the game of the game are 



A(t) := {«(•) : [t,T] -> A,ct(-) measurable} 
B{t) := {/3(.) : [£,T] -> S,/3(-) measurable}. 

We need to model the fact that at each time instant, neither player knows the 
other's future moves. We will use concept of strategies, as employed by Varaiya 
and Elliott-Kalton. The idea is that one player will select in advance, not his 
control, but rather his responses to all possible controls that could be selected by 
his opponent. 

DEFINITIONS, (i) A mapping $ : B(t) — > A(t) is called a strategy for player 
I if for all times t < s < T, 

(3(t) = $(t) for t < r < s 

implies 

(6.1) $[/3](r) = $[/3](r) fort<r<s. 

We can think of <E»[/3] as the response of player / to player IPs selection of 
control f3(-). Condition (6.1) expresses that player / cannot foresee the future, 
(ii) A strategy for player II is a mapping \1/ : Ait) — > B(t) such that for all times 

t < s < T, 

a(r) = 6l(t) for t < r < s 

implies 

*[a](r) = *[a](r) for t < r < s. 

DEFINITION. The sets of strategies are 

A(t) := strategies for player / (starting at t) 
B(t) := strategies for player i7 (starting at t). 

Finally, we introduce value functions: 
DEFINITION. The lower value function is 

(6.2) v(x,t):= inf sup P a>t [a(.),*[a](.)], 

*eB(t) a( . )eA(t) 

and the upper value function is 

(6.3) u(x,t):= sup inf P M [$[/3](-),/3(.)]. 
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One of the two players announces his strategy in response to the other's choice 
of control, the other player chooses the control. The player who "plays second", 
i.e., who chooses the strategy, has an advantage. In fact, it turns out that always 

v(x, t) < u(x, t). 



6.2 DYNAMIC PROGRAMMING, ISAACS' EQUATIONS 

THEOREM 6.1 (PDE FOR THE UPPER AND LOWER VALUE FUNC- 
TIONS). Assume u,v are continuously differentiable. Then u solves the upper 
Isaacs' equation 

u t + min &eB max a6j4 {f (x, a, b) ■ V x u(x, t) + r(x, a, b)} = 
u(x,T) = g(x), 

and v solves the lower Isaacs ' equation 

v t + max a€j4 min b6B {f (x, a, b) ■ V x v(x, t) + r(x, a, b)} = 
v(x,T) = g(x). 



(6.4) 



(6.5) 



Isaacs' equations are analogs of Hamilton-Jacobi-Bellman equation in two- 
person, zero-sum control theory. We can rewrite these in the forms 

Ut + H+(x,V x u) = 

for the upper PDE Hamiltonian 

H + (x, p) := minmaxjf (x, a,b) ■ p + r(x, a, &)}; 

and 

v t + H~(x,V x v) = 
for the lower PDE Hamiltonian 

H~(x,p) := maxmin{f(x, a, b) ■ p + r(x, a, b)}. 

aEA he II 

INTERPRETATIONS AND REMARKS, (i) In general, we have 

maxminjf (x, a,b) • p + r(x, a, b)} < minmaxjf (x, a,b) • p + r(x, a, 6)}, 

aEA bEB bEB aEA 

and consequently H~(x,p) < H + (x,p). The upper and lower Isaacs' equations are 
then different PDE and so in general the upper and lower value functions are not 
the same: u ^ v. 

The precise interpretation of this is tricky, but the idea is to think of a slightly 
different situation in which the two players take turns exerting their controls over 
short time intervals. In this situation, it is a disadvantage to go first, since the other 
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player then knows what control is selected. The value function u represents a sort 
of "infinitesimal" version of this situation, for which player I has the advantage. 
The value function v represents the reverse situation, for which player II has the 
advantage. 

If however 

(6.6) maxminjf (• • • ) • p + r(- • • )} = minmaxjf (• • • ) • p + r(- ■ ■ )}, 

aeA beB beB aeA 

for all p, x, we say the game satisfies the minimax condition, also called Isaacs' 
condition. In this case it turns out that u = v and we say the game has value. 

(ii) As in dynamic programming from control theory, if (6.6) holds, we can solve 
Isaacs' equation for u = v and then, at least in principle, design optimal controls 
for players I and II. 

(iii) To say that <**(•), are optimal means that the pair (<**(•), /3*(-)) is a 
saddle point for P Xjt . This means 

(6.7) P x >(.),/3*(-)] < P K >*(-),/3*(-)] < P«,t[a*(.),/3(-)] 

for all controls a(-),/3(-). Player / will select <**(•) because he is afraid II will play 
f3*(-). Player // will play (3*(-), because she is afraid I will play «*(•)• □ 

6.3 GAMES AND THE PONTRYAGIN MAXIMUM PRINCIPLE 

Assume the minimax condition (6.6) holds and we design optimal £**(•), /3*(-) 
as above. Let x*(-) denote the solution of the ODE (6.1), corresponding to our 
controls «*(•), /3*(-). Then define 

p*(t) :=VXx*(*),*) = V x u(x*(*),*). 

It turns out that 

(ADJ) p*(t) = - V B ff(x*(f), p*(t), a*(t),/?(t)) 

for the game-theory Hamiltonian 

H(x,p, a, b) :— f(x, a,b) • p + r(x, a, b). 



6.4 APPLICATION: WAR OF ATTRITION AND ATTACK. 

In this section we work out an example, due to R. Isaacs [I]. 
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6.4.1 STATEMENT OF PROBLEM. We assume that two opponents I 
and II are at war with each other. Let us define 

= supply of resources for 7 
x 2 (t) = supply of resources for 77. 

Each player at each time can devote some fraction of his/her efforts to direct attack, 
and the remaining fraction to attrition (= guerrilla warfare). Set A = B = [0,1], 
and define 

a(t) = fraction of 7's effort devoted to attrition 
1 — a(t) = fraction of 7's effort devoted to attack 

(3(t) = fraction of IPs effort devoted to attrition 
1 — P(t) = fraction of IPs effort devoted to attack. 

We introduce as well the parameters 

mi = rate of production of war material for I 

m 2 = rate of production of war material for II 
c\ = effectiveness of IPs weapons against 7's production 
c 2 = effectiveness of 7's weapons against 77' s production 

We will assume 

C2 > ci, 

a hypothesis that introduces an asymmetry into the problem. 

The dynamics are governed by the system of ODE 

f x 1 (t) = m 1 -c 1 p(t)x 2 (t) 
\ x 2 {t) = m 2 -c 2 a(t)x 1 (t). 

Let us finally introduce the payoff functional 

P[a(-), /?(•)]= I {l-a{t))x\t)-{l-(3{t))x 2 {t)dt 
Jo 

the integrand recording the advantage of 7 over 77 from direct attacks at time t. 
Player 7 wants to maximize P, and player 77 wants to minimize P. 

6.4.2 APPLYING DYNAMIC PROGRAMMING. First, we check the 
minimax condition, for n = 2, p = (j>i,P2)' 

f(x, a,b) • p + r(x, a, b) = (mi — c\ bx<i)p\ 

+ ( m2 - c 2 axi)p 2 + (1 - a)x\ - (1 - b)x 2 
= rriipi + m 2 p2 + xi - x 2 + a{-x\ - c 2 xip 2 ) + b(x 2 - cix 2 pi). 
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Since a and b occur in separate terms, the minimax condition holds. Therefore 
v = u and the two forms of the Isaacs' equations agree: 

v t + H(x,V x v) = 0, 

for 

H(x,p) := H + (x,p) = H~(x,p). 
We recall A = B = [0, 1] and p = V x v, and then choose a G [0, 1] to maximize 

ax x (-1 - c 2 v X2 ) . 

Likewise, we select b G [0, 1] to minimize 

bx 2 (1 - ciu-bJ . 

Thus 

1 if —1 — C2V X2 > 
if -1 - C2f X2 < 0, 



(6.9) a = 
and 

(6.10) (3 = 



if 1 - dv Xl > 

1 if 1 — c\v Xl < 0. 

So if we knew the value function v, we could then design optimal feedback controls 
for /, II. 

It is however hard to solve Isaacs' equation for v, and so we switch approaches. 

6.4.3 APPLYING THE MAXIMUM PRINCIPLE. Assume a( -),/3( •) 
are selected as above, and x(-) corresponding solution of the ODE (6.8). Define 

p(t) :=V x v(x(t),t). 

By results stated above, p(-) solves the adjoint equation 

(6.11) p(t) = -V x H(x(t),p(t),cx(t),(3(t)) 

for 

H(x,p, a, 6) = p • f(x, a, b) + r(x, a, b) 

= p\{m\ - cibx 2 ) +P2(m 2 - c 2 axi) + (1 - a)x\ - (1 - b)x 2 . 

Therefore (6.11) reads 

p 1 = a — 1 + p 2 c 2 a 
p 2 = 1 - (3 + p 1 c 1 (3, 

with the terminal conditions p 1 (T) = p 2 (T) = 0. 
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(6.12) 



We introduce the further notation 

s 1 : = -1 - c 2 v X2 = -1 - c 2 £» 2 , s 2 := 1 - c 1 v Xl = 1 - Cxp 1 ; 

so that, according to (6.9) and (6.10), the functions s 1 and s 2 control when player 
I and player II switch their controls. 

Dynamics for s± and s 2 . Our goal now is to find ODE for s 1 , s 2 . We compute 
s 1 = -c 2 p 2 = c 2 (P - 1 - Ai/?) = c 2 (-l + 0(1 - p'ci)) = c 2 (-l + /3s 2 ) 

and 

s 2 = — cip 1 = ci(l — a — p 2 c 2 a) = ci(l + a(— 1 — p 2 c 2 )) = ci(l + as 1 ). 
Therefore 

, fil ,* f ^ = c 2 (-l + /3s 2 ), S 1 (^) = -l 

1 J I s 2 = c 1 (l + as 1 ), s 2 (T) = l. 

Recall from (6.9) and (6.10) that 

1 ifsi>0 

a 



ifsi<0, 

1 if s 2 < 
if s 2 > 0. 

Consequently, if we can find s 1 , s 2 , then we can construct the optimal controls cu 
and f3. 

Calculating s 1 and s 2 . We work backwards from the terminal time T. Since 
at time T, we have s 1 < and s 2 > 0, the same inequalities hold near T. Hence we 
have a = f3 = near T, meaning a full attack from both sides. 

Next, let t* < T be the first time going backward from T at which either / or 
iT switches stategy. Our intention is to compute t*. On the time interval [t*,T], 
we have a(-) = /?(•) = 0. Thus (6.13) gives 

^ = -02, s 1 (T) = -l, s 2 = ci, s 2 (T) = l; 

and therefore 

s 1 ^) = -1 + c 2 (T - t), s 2 (t) = 1 + Cl (t-T) 

for times t* < t < T. Hence s 1 hits at time T — -; s 2 hits at time T — -. 
Remember that we are assuming c 2 > ci. Then T — - < T — -, and hence 

° ^ x Ci C2 ' 

t* = T- — . 

c 2 
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Now define t* < t* to be the next time going backward when player / or player 
II switches. On the time interval [£*,£*], we have a = 1, (3 = 0. Therefore the 
dynamics read: 

8 1 = -C 2 , S 1 (t*)=0 

« 2 = c 1 (l + S 1 ), s\t*) = !-%■ 
We solve these equations and discover that 

,'(«) = -l +<s (T-t) (t<t<t , ) 
sHt) = 1 - % - ^(t - Tf. it -- t - t >- 

Now s 1 > on [£*, £*] for all choices of t*. But s 2 = at 

C2 V C l 

If we now solve (6.13) on [0, i*] with a = /3 = 1, we learn that si,S2 do not 
change sign. 

CRITIQUE. We have assumed that x\ > and x^ > for all times £. If either 
x\ or ^2 hits the constraint, then there will be a corresponding Lagrange multiplier 
and everything becomes much more complicated. □ 

6.5 REFERENCES 

See Isaacs' classic book [I] or the more recent book by Lewin [L] for many more 
worked examples. 
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CHAPTER 7: INTRODUCTION TO STOCHASTIC CONTROL THEORY 



7.1 Introduction and motivation 

7.2 Review of probability theory, Brownian motion 

7.3 Stochastic differential equations 

7.4 Stochastic calculus, Ito chain rule 

7.5 Dynamic programming 

7.6 Application: optimal portfolio selection 

7.7 References 



7.1 INTRODUCTION AND MOTIVATION 

This chapter provides a very quick look at the dynamic programming method 
in stochastic control theory. The rigorous mathematics involved here is really quite 
subtle, far beyond the scope of these notes. And so we suggest that readers new to 
these topics just scan the following sections, which are intended only as an informal 
introduction. 



7.1.1 STOCHASTIC DIFFERENTIAL EQUATIONS. We begin with a brief 
overview of random differential equations. Consider a vector field f : R n — > R n and 
the associated ODE 



(7.1) 



x(f) = f(x(f)) (t > 0) 
x(0) = x°. 



In many cases a better model for some physical phenomenon we want to study 
is the stochastic differential equation 



(7.2) 



X(*) = f(X(t)) + <r£(t) (t>0) 
X(0) = x°, 



where £(•) denotes a "white noise" term causing random fluctuations. We have 
switched notation to a capital letter X(-) to indicate that the solution is random. 
A solution of (7.2) is a collection of sample paths of a stochastic process, plus 
probabilistic information as to the likelihoods of the various paths. 

7.1.2 STOCHASTIC CONTROL THEORY. Now assume f : R n x A -> R n 
and turn attention to the controlled stochastic differential equation: 



(SDE) 



X( S ) = f(X( S ),A( S )) + £( S ) (t<s<T) 
X(t) = x°. 
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DEFINITIONS, (i) A control A(-) is a mapping of [t, T] into A, such that 
for each time t < s < T, A(s) depends only on s and observations of X(r) for 
t < r < s. 

(ii) The corresponding payoff functional is 

(p) PxAM-)] = E {[ T r(x(s) ' A(s)) rfs + ^ (X(T)) } ' 

the expected value over all sample paths for the solution of (SDE). As usual, we are 
given the running payoff r and terminal payoff g. 

BASIC PROBLEM. Our goal is to find an optimal control A*(-), such that 

P x>t [A*(.)] = maxP*,*[A(.)]. 

A(-) 



DYNAMIC PROGRAMMING. We will adapt the dynamic programming meth- 
ods from Chapter 5. To do so, we firstly define the value function 

v(x,t) := supP X)t [A(-)]. 
A(.) 

The overall plan to find an optimal control A*(-) will be (i) to find a Hamilton- 
Jacobi-Bellman type of PDE that v satisfies, and then (ii) to utilize a solution of 
this PDE in designing A*. 

It will be particularly interesting to see in §7.5 how the stochastic effects modify 
the structure of the Hamilton- Jacobi-Bellman (HJB) equation, as compared with 
the deterministic case already discussed in Chapter 5. 

7.2 REVIEW OF PROBABILITY THEORY, BROWNIAN MOTION. 

This and the next two sections provide a very, very rapid introduction to math- 
ematical probability theory and stochastic differential equations. The discussion 
will be much too fast for novices, whom we advise to just scan these sections. See 
§7.7 for some suggested reading to learn more. 

DEFINITION. A probability space is a triple (CI, T, P), where 

(i) O is a set, 

(ii) T is a a-algebra of subsets of O, 

(iii) P is a mapping from T into [0, 1] such that P(0) = 0, P(Cl) = 1, and 
P(U™iAi) = EZi provided Ai n Aj = for all % ^ j. 

A typical point in Cl is denoted "a;" and is called a sample point. A set A e T 
is called an event. We call P a probability measure on O, and P(A) G [0, 1] is 
probability of the event A. 
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DEFINITION. A random variable X is a mapping X : O — > R such that for 
all t G 1 

{uj | < t} G JP 

We mostly employ capital letters to denote random variables. Often the depen- 
dence of X on u is not explicitly displayed in the notation. 

DEFINITION. Let X be a random variable, defined on some probability space 
(fi, J 7 , P). The expected value of X is 

P[A] := I XdP. 
Jo. 

EXAMPLE. Assume O C R m , and P(A) = J A fdu for some function / : R m -> 
[0, oo), with J Q fdu = 1. We then call / the density of the probability P, and write 
"dP = /cL;" . In this case, 

E[X] = I Xfdw. 

□ 

DEFINITION. We define also the variance 

Var(A) = E[(X - E(X)) 2 } = E[X 2 } - (E[X}) 2 . 

IMPORTANT EXAMPLE. A random variable X is called normal (or Gaussian) 
with mean fx, variance a 2 if for all — oo < a < b < oo 

P(a < X < b) = - / e~ JE ^dx, 

We write "A is A(^,a 2 )". 

DEFINITIONS, (i) Two events A, B e T are called independent if 

P(AnP) = P(A)P(B). 

(ii) Two random variables X and Y are independent if 

P(X < t and Y < s) = P(X < t)P(Y < s) 

for all t, s G R. In other words, X and Y are independent if for all t, s the events 
A = {X < t} and B = {Y < s} are independent. 

DEFINITION. A stochastic process is a collection of random variables X(t) 
(0 < t < oo), each defined on the same probability space (O, JF, P). 

The mapping 1 1— > A(t, cu) is the w-th sample path of the process. 
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DEFINITION. A real-valued stochastic process W(t) is called a Wiener pro- 
cess or Brownian motion if 

(i) W(0) = 0, 

(ii) each sample path is continuous, 

(hi) W(t) is Gaussian with fx = 0, a 2 = t (that is, W(t) is N(0,t)), 
(iv) for all choices of times < t\ < t 2 < ■ ■ ■ < t m the random variables 

W(t!), W(t 2 ) - Wih), . . . , W(t m ) - W{t m ^) 

are independent random variables. 

Assertion (iv) says that W has "independent increments" . 

INTERPRETATION. We heuristically interpret the one-dimensional "white noise : 
£(■) as equalling dW Jf - ■ However, this is only formal, since for almost all uo, the sam- 
ple path t i— > W(t, uj) is in fact nowhere differentiable. □ 

DEFINITION. An n-dimensional Brownian motion is 
W(t) = (W 1 (t),W 2 (t),...,W n (t)) T 
when the W % {t) are independent one-dimensional Brownian motions. 

We use boldface below to denote vector-valued functions and stochastic pro- 
cesses. 



7.3 STOCHASTIC DIFFERENTIAL EQUATIONS. 

We discuss next how to understand stochastic differential equations, driven by 
"white noise" . Consider first of all 



(7.3) 



X(*) = f(X(*)) + <r£(*) (£>0) 
X(0) = x°, 



where we informally think of £ = W. 

DEFINITION. A stochastic process X(-) solves (7.3) if for all times t > 0, we 
have 

(7.4) X(t) = x°+[ f(X(s))ds + aW(t). 

Jo 

REMARKS, (i) It is possible to solve (7.4) by the method of successive approxi- 
mation. For this, we set X°(-) = x, and inductively define 



(t) :=x°+ [ f(X k (s))ds + aW(t). 
Jo 



x fc+ \ 

Jo 
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It turns out that X fc (t) converges to a limit X(i) for alH > and X(-) solves the 
integral identity (7.4). 

(ii) Consider a more general SDE 

(7.5) X(f) = f (X(f)) + H(X(f))£(f) (f > 0), 

which we formally rewrite to read: 

^ W )) + H ( X (( >>^ 

and then 

dX(t) = f(X(t))dt + H(X(t))dW(t). 

This is an ltd stochastic differential equation. By analogy with the foregoing, we 
say X(-) is a solution, with the initial condition X(0) = x°, if 

X(t)=x° + [ f(X(s))ds + / H(X(s)) • dW(s) 
Jo Jo 

for all times t > 0. In this expression J * H(X) ■ cfW is called an ltd stochastic 
integral. □ 

REMARK. Given a Brownian motion W(-) it is possible to define the ltd sto- 
chastic integral 

t 

Y-dW 



o 

for processes Y(-) having the property that for each time < s < t "Y(s) depends 
on W(t) for times < r < s, but not on W(r) for times s < r. Such processes are 
called "nonanticipating" . 

We will not here explain the construction of the Ito integral, but will just record 
one of its useful properties: 



(7.6) E 



Y-dW 



o 



0. 

□ 



7.4 STOCHASTIC CALCULUS, ITO CHAIN RULE. 

Once the Ito stochastic integral is defined, we have in effect constructed a new 
calculus, the properties of which we should investigate. This section explains that 
the chain rule in the Ito calculus contains additional terms as compared with the 
usual chain rule. These extra stochastic corrections will lead to modifications of the 
(HJB) equation in §7.5. 
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7.4.1 ONE DIMENSION. We suppose that n = 1 and 

f dX(t) = A(t)dt + B(t)dW(t) (f>0) 
1 J lX(0)=a: . 

The expression (7.7) means that 

X(t)=x°+ [ A(s)ds + [ B(s)dW(s) 
Jo Jo 

for all times t > 0. 

Let -u : K. — > R and define 

Y(t) :=u(X(t)). 

We ask: what is the law of motion governing the evolution of Y in time? Or, in 
other words, what is dY(t)7 

It turns out, quite surprisingly, that it is incorrect to calculate 

dY(t) = d(u(X(t)) = u(X(t))dX(t) = u'(X(t))(A(t)dt + B(t)dW(t)). 

ITO CHAIN RULE. We try again and make use of the heuristic principle that 
u dW = ((it) 1 / 2 ". So let us expand u into a Taylor's series, keeping only terms of 
order dt or larger. Then 

dY(t) = d(u(X(t))) 

= u'{X{t))dX{t) + \u"{X{t))dX{tf + \u"'{X{t))dX{tf + ... 

= u'{X{t))[A{t)dt + B(t)dW(t)} + \u"{X{t))[A{t)dt + B{t)dW{t)} 2 + . . . , 

the last line following from (7.7). Now, formally at least, the heuristic that dW = 
(dt) 1 / 2 implies 

[A(t)dt + B(t)dW(t)} 2 = Aitfdt 2 + 2A(t)B(t)dtdW{t) + B 2 (t)dW(t) 2 

= B 2 (t)dt + o(dt). 

Thus, ignoring the o(dt) term, we derive the one- dimensional ltd chain rule 



dY(t) = d(u{X(t))) 

(7.8) 



u\X(t))A(t)+ l -B 2 (t)u"(X(t)) 
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dt + u'(X(t))B(t)dW(t). 



This means that for each time t > 

r i „ i 

ds 



u(X(t)) = Y(t) = Y(0)+ f 

Jo 



u\X(s))A(s)+ l -B\s)u"(X(s)) 



+ [ u'(X(s))B(s)dW(s). 
Jo 



7.4.2 HIGHER DIMENSIONS. We turn now to stochastic differential equations 
in higher dimensions. For simplicity, we consider only the special form 

J dX(t) = A(t)dt + adW(t) (t > 0) 
1 X(0) = x°. 



(7.9) 
We write 



X(t) = (X\t),X 2 (t),...,X n (t)) T . 

The stochastic differential equation means that for each index i, we have dX l (t) 
A l (t)dt + adW l (t). 

ITO CHAIN RULE AGAIN. Let u : R n x [0, oo) -> R and put 

Y(t) :=u(X(t),t). 
What is dY? Similarly to the computation above, we calculate 
dY(t) = d[u(X(t),t)] 

n 

= u t (x(t),t)dt + u *i (Mt), t)dx\t) 



i n 

+ 2 E n XiXj {X{t),t)dX\t)dX\t). 

Now use (7.9) and the heuristic rules that 

dW l = (dt) 1/2 and dW i dW j = 



dt if i = j 
if i^j. 

The second rule holds since the components of dW are independent. Plug these 
identities into the calculation above and keep only terms of order dt or larger: 

n 

dY(t) = u t (X(t), t)dt + u xi (X(t), t) [A\t)dt + adW l (t)] 

i=i 

2 n 

(7.10) +yH ?t x i x,( x W.^ 

i=l 

= u t {X(t),t) + V x u(X(t),t) ■ [A(t)dt + adW(t)] 

2 

+ — Au(X(t),t)dt. 

102 



This is Ito's chain rule in n- dimensions. Here 

^ dxf 

i=l 1 

denotes the Laplacian. 

7.4.3 APPLICATIONS TO PDE. 

A. A stochastic representation formula for harmonic functions. Con- 
sider a region f/Cl™ and the boundary-value problem 

Au = (xeU) 
u = g (x e dU) 



(7.11) 



where, as above, A = Y^i=\ ~§^z ls the Laplacian. We call u a harmonic function. 

We develop a stochastic representation formula for the solution of (7.11). Con- 
sider the random process X(t) = W(t) + x; that is, 

r rfx(t) = rfw(t) (t > o) 

\ X(0) = x 

and W(-) denotes an n-dimensional Brownian motion. To find the link with the 
PDE (7.11), we define Y(t) := u(X(t)). Then Ito's rule (7.10) gives 

dY(t) = Vu(X(*)) • efW(£) + ^Au(X(*))d*. 
Since Att = 0, we have 

dY(t) = Vu(X(t)) -dW(t); 

which means 

u(x(t)) = y(t) = y(o) + / v«(x(s)) • rfw(s). 

Let r denote the (random) first time the sample path hits Then, putting 
t = t above, we have 

u(x) = u(X(r)) - / Vu-dW(s). 
But tt(X(r)) = 5f(X(r)), by definition of r. Next, average over all sample paths: 

Vu-dW . 



f 

Jo 



u(x) = E[g(X(r))]-E 

The last term equals zero, according to (7.6). Consequently, 

u(x) = E[g(X(r))}. 
103 



INTERPRETATION. Consider all the sample paths of the Brownian motion 
starting at x and take the average of g(K(r)). This gives the value of u at x. □ 

B. A time-dependent problem. We next modify the previous calculation to 
cover the terminal-value problem for the inhomogeneous backwards heat equation: 



(7.11) 



ut(x, t) + ^A«(x, t) = f{x, t) (x E R n , < t < T) 

u(x,T) = g(x). 



Fix x E R, < t < T. We introduce the stochastic process 

f dX(s) = adW(s) (s > t) 
{ X(t) =x. 

Use Ito's chain rule (7.10) to compute du(X(s), s): 

du(X(s), s) = u s (X(s), s) ds + V x u(X.(s), s) ■ dX(s) H Au(X(s), s) ds. 

Now integrate for times t < s < T, to discover 

f T a 2 

u(X(T),T) = u(X(t), t) + / — Au(X(s), s) + u s (X(s), s) ds 

+ £ aV x w(X(s),s) -dW(s). 

Then, since u solves (7.11): 

u(x, t) = E (g{X{T)) - £ /(X(s), s) ds^j . 
This is a stochastic representation formula for the solution u of the PDE (7.11). 

□ 

7.5 DYNAMIC PROGRAMMING. 

We now turn our attention to controlled stochastic differential equations, of the 
form 

dX{s) = f (X(s), A(s)) cte + arfW(s) (t < s < T) 
X(t) =x. 



(SDE) 
Therefore 



X(r) = x + f t f ( X ( s )' A ( s )) rfs + <r[W(r) - W(£)] 
for all t < r < T. We introduce as well the expected payoff functional 

(P) Px,*[A(.)] := E ^ T r(X(s), A(s)) ds + </(X(T)) J . 
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The value function is 



v(x,t) := sup P x , t [A(- 
A(-)eA 



We will employ the method of dynamic programming. To do so, we must (i) 
find a PDE satisfied by v, and then (ii) use this PDE to design an optimal control 
A*(-). 

7.5.1 A PDE FOR THE VALUE FUNCTION. 

Let A(-) be any control, and suppose we use it for times t < s < t + h, h > 0, 
and thereafter employ an optimal control. Then 

(7.12) v(x,t) > E^ + r(X(s), A(s)) ds + v(X(t + h),t + h)^j , 

and the inequality in (7.12) becomes an equality if we take A(-) = A*(-), an optimal 
control. 

Now from (7.12) we see for an arbitrary control that 

( rt+h 

0>E< r(X(s),A(s))ds + v(X(t + h),t + h)-v(x,t) 



= E ^yj t rcfe j + E{v(X(t + h),t + h) -v(x,t)}. 
Recall next Ito's formula: 

n 

dv(X(s), s) = v t (X(s), s) ds + ^(X(s), s)dX i (s) 

i=l 
1 n 

( 7 - 13 ) v XiXj (X(s),s)dX%s)dXi(s) 



2 

2 

= v t ds + V x v -(fds + adW(s)) + ^-Av ds. 



This means that 

nt+h 



v(X(t + h),t + h)- v(X(t), t) = J (v t + V x v-f + y Av ) 

/t+h 
aV x v ■ dW(s); 

and so we can take expected values, to deduce 



ds 



(7.14) E[v(X(t + h),t + h) -v(x,t)} = E 
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t+h / a 2 

v t + V x vf+—Av ) ds 



We derive therefore the formula 
> E 

Divide by h: 
> E 



t+h / a 2 



r + v t + V x v ■ f + — Av ) ds 



i rt+h 

-J t r(X( S ),A( S )) + 



a 2 



v t (X(s), s) + f (X(s), A(s)) ■ V x v(X(s), s) + —Av(X(s), s) ds 

If we send h—*0, recall that X(t) = x and set A(t) := a G A, we see that 

a 2 

> r(x, a) + u t (x, t) + f (x, a) • V x v(x, t) + —Av(x, t). 

The above identity holds for all x,t,a and is actually an equality for the optimal 
control. Hence 

max i v t + f • V x f + ^-Av + r I = 0. 

aeA I 2 J 

Stochastic Hamilton-Jacobi-Bellman equation. In summary, we have shown 
that the value function v for our stochastic control problem solves this PDE: 



(HJB) 



2 

v t (x, t) + yAt;(i, t) + max a6 A {f (x, a) • V x v(x, t) + r(x, a)} = 
v(x,T) = g(x). 



This semilinear parabolic PDE is the stochastic Hamilton-Jacobi-Bellman equation. 

Our derivation has been very imprecise: see the references for rigorous deriva- 
tions. 

7.5.2 DESIGNING AN OPTIMAL CONTROL. 

Assume now that we can somehow solve the (HJB) equation, and therefore 
know the function v. We can then compute for each point (x, t) a value a G A for 
which V x v(x,t) ■ f(x,a) +r(x,a) attains its maximum. In other words, for each 
(x,t) we choose a = oc(x,t) such that 

max [f (x, a) ■ V x v(x, t) + r(x, a)] 

aEA 

occurs for a = ct(x,t). Next solve 

dX.*(s) = f(X*(s), a(X*(s),s)) ds + adW(s) 
X*(t) =x. 

assuming this is possible. Then A*(s) = a(X*(s), s) is an optimal feedback control. 



7.6 APPLICATION: OPTIMAL PORTFOLIO SELECTION. 
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Following is an interesting example worked out by Merton. In this model we 
have the option of investing some of our wealth in either a risk-free bond (growing at 
a fixed rate) or a risky stock (changing according to a random differential equation). 
We also intend to consume some of our wealth as time evolves. As time goes on, 
how can we best (i) allot our money among the investment opportunities and (ii) 
select how much to consume? 

We assume time runs from to a terminal time T. Introduce the variables 

X(t) = wealth at time t (random) 
b{t) = price of a risk-free investment, say a bond 

S(t) = price of a risky investment, say a stock (random) 
a 1 (t) = fraction of wealth invested in the stock 
a 2 (t) = rate at which wealth is consumed. 

Then 

(7.15) O^a^^l, 0<a 2 (t) (0 < t < T). 

We assume that the value of the bond grows at the known rate r > 0: 

(7.16) db = rbdt; 
whereas the price of the risky stock changes according to 

(7.17) dS = RSdt + aSdW. 
Here r, R, a are constants, with 

R > r > 0, a ^ 0. 

This means that the average return on the stock is greater than that for the risk-free 
bond. 

According to (7.16) and (7.17), the total wealth evolves as 

(7.18) dX={\- a}-[t))Xrdt + a 1 (t)X( J Rrft + adW) - a 2 (t)dt. 

Let 

Q ■= {( x ,t) | < t < T, x > 0} 

and denote by r the (random) first time X(-) leaves Q. Write A(t) = (a 1 (t), a 2 {t)) T 
for the control. 

The payoff functional to be maximized is 

PxAM-)] = E e-P s F(a 2 (s)) ds^j , 
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where F is a given utility function and p > is the discount rate. 

Guided by theory similar to that developed in §7.5, we discover that the corre- 
sponding (HJB) equation is 
(7.19) 

u t + max < u xx + ((1 - a\)xr + a\xR - a 2 )u x + e~ pt F(a 2 ) > = 0, 

o<«i<i,«2>o I 2 I 

with the boundary conditions that 

(7.20) u(0,t) = 0, u(x,T) = 0. 
We compute the maxima to find 

(7.21) a 1 * = ~ {R 2 ~ r)u \ F'(a 2 *)=e pt u X7 

(J QDUxx 

provided that the constraints < a 1 * < 1 and < a 2 * are valid: we will need to 
worry about this later. If we can find a formula for the value function u, we will 
then be able to use (7.21) to compute optimal controls. 

Finding an explicit solution. To go further, we assume the utility function F 
has the explicit form 

F(a)=a 1 (0<7<1). 
Next we guess that our value function has the form 

u(x,t) = gitfx 1 , 

for some function g to be determined. Then (7.21) implies that 

a 1 * = 2 R ' r a 2 * = [e*g(t)]^x. 
v (1 -7) 

Plugging our guess for the form of u into (7.19) and setting a± = a 1 *, a 2 = a 2 *, we 
find 

(g'(t) + v ig {t) + (1 - ^)g(t)(e pt g(t))^ x^ = 

for the constant 

(R-r) 2 

v ■= — h r 

2a 2 (l- 7 ) + 

Now put 

hit) := (e^g(t))^ 
to obtain a linear ODE for h. Then we find 

1-7 

g(t) = e~ pt 



1—7 / -(p-u-y)jT-t) 

1 - e 



p - \ 

If R — r < a 2 (l — 7), then < a 1 * < 1 and a 2 * > as required. □ 

108 



7.7 REFERENCES 

The lecture notes [E] , available online, present a fast but more detailed discus- 
sion of stochastic differential equations. See also Oskendal's nice book [O]. 

Good books on stochastic optimal control include Fleming- Rishel [F-R] , Fleming- 
Soner [F-S], and Krylov [Kr]. 
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APPENDIX: PROOFS OF THE PONTRYAGIN MAXIMUM PRINCIPLE 



A.l An informal derivation 

A. 2 Simple control variations 

A. 3 Free endpoint problem, no running payoff 

A. 4 Free endpoint problem with running payoffs 

A. 5 Multiple control variations 

A. 6 Fixed endpoint problem 

A. 7 References 



A.l. AN INFORMAL DERIVATION. 

In this first section we present a quick and informative, but imprecise, study of 
variations for the free endpoint problem. This discussion motivates the introduction 
of the control theory Hamiltonian H(x,p,a) and the costates p*(-), but will not 
provide all the information found in the full maximum principle. Sections A.2-A.6 
will build upon the ideas we introduce here. 

Adjoint linear dynamics. Let us start by considering an initial value problem 
for a simple time-dependent linear system of ODE, having the form 

y(t) = A(t)y(t) (0 < t < T) 

y(0) = y°. 

The corresponding adjoint equation reads 

p(t) = -A T (t)p(t) (0<t<T) 
P(T)= P °. 

Note that this is a terminal value problem. To understand why we introduce the 
adjoint equation, look at this calculation: 

d . . 

^(p-y) =p-y + p-y 

= -(A T p)-y + p-(Ay) 
= -p • (Ay) + p • (Ay) 
= 0. 

It follows that t i— > y(t) ■ p(t) is constant, and therefore y(T) ■ p° = y° ■ p(0). The 
point is that by introducing the adjoint dynamics, we get a formula involving y(T), 
which as we will later see is sometimes very useful. 

Variations of the control. We turn now to our basic free endpoint control 
problem with the dynamics 

(ODE) { *(*) = f(*(*W*)) (*>°) 

1 ' \ x(0) = x°. 
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and payoff 

(P) P[a(-)]=/ r( X ( a ),a(a))£fa + </(x(T)). 

Let £**(•) be an optimal control, corresponding to the optimal trajectory x*(-). 
Our plan is to compute "variations" of the optimal control and to use an adjoint 
equation as above to simplify the resulting expression. To simplify notation, we 
drop the superscript * for the time being. 

Select e > and define the variation 

a e (t) := a(t) + e(3(t) (0<t< T), 

where f3(-) is some given function selected so that a e (-) is an admissible control for 
all sufficiently small e > 0. Let us call such a function /3(-) an acceptable variation 
and for the time being, just assume it exists. 

Denote by x e (-) the solution of (ODE) corresponding to the control a e (-). Then 
we can write 

x e (t) = x(£) + ey(t) + o(e) (0 < t < T) , 

where 

y = V x f (x, a)y + V a f (x, a)/3 (0 < t < T) 
y(0) = 0. 



(8.1) 



We will sometimes write this linear ODE as y = A(t)y + V a f (3 for A(t) := 
V x f(x(t),a(t)). 

Variations of the payoff. Next, let us compute the variation in the payoff, by 
observing that 

d 



d£ PM)\ 



<o, 

e=0 

since the control a(-) maximizes the payoff. Putting a e (-) in the formula for P[- 
and differentiating with respect to e gives us the identity 



(8-2) ^M- 



= [ V x r(x, Q: )y + V a r(x,a)/3d S + V^(x(T))-y(T). 



We want to extract useful information from this inequality, but run into a major 
problem since the expression on the left involves not only (3, the variation of the 
control, but also y, the corresponding variation of the state. The key idea now is 
to use an adjoint equation for a costate p to get rid of all occurrences of y in (8.2) 
and thus to express the variation in the payoff in terms of the control variation f3. 

Ill 



Designing the adjoint dynamics. Our opening discussion of the adjoint problem 
strongly suggests that we enforce the terminal condition 

p(T) = V<7(x(T)); 

so that Vg(x(T)) • y(T) = p(T) • y(T), an expression we can presumably rewrite by 
computing the time derivative of p • y and integrating. For this to work, our first 
guess is that we should require that p(-) should solve p = — A T p. But this does 
not quite work, since we need also to get rid of the integral term involving V x ry. 

But after some experimentation, we learn that everything works out if we re- 
quire 

p = -V x fp-V x r (0<t<T) 
p(T) = V<7(x(T). 
In other words, we are assuming p = — A T p — V x r. Now calculate 

^(p-y) = py + py 

= -(A T p + V x r) . y + p . (Ay + V a f (3) 
= -V x r y + p- VJf3. 

Integrating and remembering that y(0) = 0, we find 

V^(x(T))-y= / p- V a f(3- V x r-yds. 
Jo 



We plug this expression into (8.2), to learn that 

<0, 



I 



T 

(p-V a f + V a r)f3ds= ^-P[a e (-)] 



e=0 

in which f and r are evaluated at (x, a). We have accomplished what we set out to 
do, namely to rewrite the variation in the payoff in terms of the control variation 
(3. 

Information about the optimal control. We rewrite the foregoing by reintro- 
ducing the subscripts *: 

(8.3) / (p* • V a f(x%a*) + V a r(x*,a*))/3cfe < 0. 

Jo 

The inequality (8.3) must hold for every acceptable variation (3(-) as above: what 
does this tell us? 

First, notice that the expression next to (3 within the integral is V a H (x*, p*, a*) 
for H(x,p,a) = f(x,a) ■ p + r(x,a). Our variational methods in this section have 
therefore quite naturally led us to the control theory Hamiltonian. Second, that 
the inequality (8.3) must hold for all acceptable variations suggests that for each 
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time, given x*(£) and p*(t), we should select a*(t) as some sort of extremum of 
H(x*(t), p*(t), a) among control values a E A. And finally, since (8.3) asserts the 
integral expression is nonpositive for all acceptable variations, this extremum is 
perhaps a maximum. This last is the key assertion of the full Pontryagin Maximum 
Principle. 

CRITIQUE. To be clear: the variational methods described above do not actually 
imply all this, but are nevertheless suggestive and point us in the correct direction. 
One big problem is that there may exist no acceptable variations in the sense above 
except for (3(-) = 0; this is the case if for instance the set A of control values is finite. 
The real proof in the following sections must therefore introduce a different sort of 
variation, a so-called simple control variation, in order to extract all the available 
information. 

A.2. SIMPLE CONTROL VARIATIONS. 

Recall that the response x(-) to a given control a(-) is the unique solution of 
the system of differential equations: 

x(£) = f(x(£),a(£)) (*>0) 
x(0) = x°. 

We investigate in this section how certain simple changes in the control affect the 
response. 

DEFINITION. Fix a time s > and a control parameter value a £ A. Select 
e > so small that < s — e < s and define then the modified control 

a if s — e < t < s 
ct(t) otherwise. 

We call Q: e (-) a simple variation of <*(•). 

Let x e (-) be the corresponding response to our system: 

x e (£) = f(x e (£),a e (£)) (£> 0) 
x e (0) =x°. 

We want to understand how our choices of s and a cause x e (-) to differ from x(-), 
for small e > 0. 



(ODE) 



a e (t) :-- 



(8.4) 



NOTATION. Define the matrix- valued function A : [0, oo) — > M nXn by 

A(£) := V K f(x(£),a(£)). 

In particular, the (i,j) th entry of the matrix A(£) is 

/*.(x(£), <*(£)) (l<i,j<n). 
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We first quote a standard perturbation assertion for ordinary differential equa- 
tions: 

LEMMA A.l (CHANGING INITIAL CONDITIONS). Lety e (-) solve the 
initial-value problem: 

y £ (t) = f(y £ (t),cx(t)) (t > 0) 
y e (0) = x° + ey° + o(e). 

Then 

y £ (t) =x(t)+ey(t) + o(e) as £ -> 0, 
uniformly for t in compact subsets of [0, oo), where 

y(t) = A(t)y(t) (t > 0) 

y(0) = y°. 

Returning now to the dynamics (8.4), we establish 

LEMMA A. 2 (DYNAMICS AND SIMPLE CONTROL VARIATIONS). We 

have 

x e (£) = x(£) + ey(t) + o(e) as £ — > 0, 
uniformly for t in compact subsets of [0, oo), where 

y(t) = (0<t<s) 

and 

,„ / Ht) = A(<)y«) « > «) 

(8 ' 6) \yM = »-, 

for 

(8.6) y s :=f(x(s), a) -f(x( S ), «(*)). 

NOTATION. We will sometimes write 

y(t)=Y(t,s)y s (t>s) 

when (8.5) holds. 

Proof. Clearly x e (£) = x(£) for < t < s — e. For times s — e < t < s, we 
have ^ 

x e (£)-x(£) = A f(x e (r),a)-f(x(r),a(r))dr + o(e). 

i/ s — e 

Thus, in particular, 

x e (s) - x(s) = [f (x(s), a) - f (x(s), a(s))]e + o(e). 
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(ODE) 



On the time interval [s,oo), x(-) and x e (-) both solve the same ODE, but with 
differing initial conditions given by 

x e (s) = x(s) +ey s + o(e), 

for y s defined by (8.5). 

According to Lemma A.l, we have 

x e (t) = x(t) + ey(t) + o(e) (t>s), 

the function y(-) solving (8.5). □ 

A.3. FREE ENDPOINT PROBLEM, NO RUNNING COST. 

STATEMENT. We return to our usual dynamics 

*(*) = f(x(t),a(£)) (0<t<T) 
x(0) = x°, 

and introduce also the terminal payoff functional 
(P) P[a(.)] = «/(x(T)), 

to be maximized. We assume that £**(•) is an optimal control for this problem, 
corresponding to the optimal trajectory x*(-). 

We are taking the running payoff r = 0, and hence the control theory Hamil- 
tonian is 

H(x, p, a) = f(x, a) ■ p. 
We must find p* : [0, T] — > M n , such that 

(ADJ) p*(t) = -V x H(x*(t),p*(t) t a*(t)) (0 < t < T) 

and 

(M) H(x*(t),p*(t),a*(t)) = max^(x*(t),p*(t),a). 

To simplify notation we henceforth drop the superscript * and so write x(-) for 
x*(-), a(-) for «*(•), etc. Introduce the function A(-) = V x f (x(-), a(-)) and the 
control variation ac e (-), as in the previous section. 

THE COSTATE. We now define p : [0, T] -> R to be the unique solution of the 
terminal- value problem 

, g7) |pW = -a t WpW (o<t<T) 

1 • ) I p(T) = V<7(x(T)). 



We employ p(-) to help us calculate the variation of the terminal payoff: 
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LEMMA. 3 (VARIATION OF TERMINAL PAYOFF). We have 

(8.8) Te P[ae{ ' )]U = = P(S) ' [f(x(s) ' a) " *(*(«)»«(*))]• 

Proof. According to Lemma A. 2, 

P[a e (-)] = <?(x e (T)) = ^(x(T) + ey(T) + 
where y(-) satisfies (8.5). We then compute 

(8-9) ^[a e (.)]U = V 9 (x(T)).y(T). 

On the other hand, (8.5) and (8.7) imply 

^(p(0-y(*)) = p(*)-y(*) + p(*)-y(*) 

= -A T (t)p(t).y(t) + p(t).A(t)y(t) 
= 0. 

Hence 

V(/(x(T)) • y(T) = p(T) • y(T) = p( s ) • y( s ) = p( s ) • y s . 
Since y s = f(x(s), a) — f (x(s), a(s)), this identity and (8.9) imply (8.8). □ 



We now restore the superscripts * in our notation. 

THEOREM A. 4 (PONTRYAGIN MAXIMUM PRINCIPLE). There ex- 
ists a function p* : [0, T] — > M. n satisfying the adjoint dynamics (AD J), the maxi- 
mization principle (M) and the terminal /transversality condition (T). 

Proof. The adjoint dynamics and terminal condition are both in (8.7). To 
confirm (M), fix < s < T and a E A, as above. Since the mapping e h- > P[a £ (-)] 
for < e < 1 has a maximum at e = 0, we deduce from Lemma A. 3 that 

> ^P[o e (-)] = P» • [f (x», a) - f(x*( S ), a*( S )]. 

Hence 

fT(x*( a ),p*( a ),o) = f(x*( a ),a).p*( a ) 

<f(x*( a ),a*( a )).p*( a ) = J ff(x*( a ),p*( a ),a*(a)) 

for each < s < T and a £ A. This proves the maximization condition (M). □ 



A.4. FREE ENDPOINT PROBLEM WITH RUNNING COSTS 
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We next cover the case that the payoff functional includes a running payoff: 

rp 

(P) P[a(-)]=/ r(x(s),a(s))ds + g(x(T)). 

The control theory Hamiltonian is now 

H(x,p, a) = f(x, a) ■ p + r(x, a) 

and we must manufacture a costate function p*(-) satisfying (ADJ), (M) and (T). 

ADDING A NEW VARIABLE. The trick is to introduce another variable 
and thereby convert to the previous case. We consider the function x n+1 : [0, T] — > R 
given by 

x n+1 (t) = r(x(t),a(t)) (0<t<T) 
x n+1 (0) = 0, 

where x(-) solves (ODE). Introduce next the new notation 

/ X! \ 



(8.10) 



x 



X 
Xn+1 



x n 



X° 



X 





V o / 



x(t) 
x n+1 (t) 



I x\t) \ 

x n (t) 
\x n+1 (t)/ 



f(x, a) 



f(x, a) 

r(x, a) 



f n {x,a) 
\ r(x,a) I 



and 



(ODE) 



g(x) := g(x) + x n+1 . 
Then (ODE) and (8.10) produce the dynamics 

x(t) = f(x(£),a(£)) (0<t<T) 
x(0) = x°. 

Consequently our control problem transforms into a new problem with no running 
payoff and the terminal payoff functional 

(P) P[a(-)]:=g(5t(T))- 

We apply Theorem A.4, to obtain p* : [0, T] — > R n+1 satisfying (M) for the Hamil- 
tonian 



(8.11) 



H(x,p, a) = f(x, a) ■ p. 



Also the adjoint equations (ADJ) hold, with the terminal transversality condition 

(T) p*(T) = Vs(x*(T)). 
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But f does not depend upon the variable x n+ i, and so the (n + 1) equation in the 
adjoint equations (ADJ) reads 

^i,. (t) = _H Xn+i = o. 
Since g Xn+1 = 1, we deduce that 
(8.12) p n+1 ^(t) = l. 

As the (n + l) th component of the vector function f is r, we then conclude from 
(8.11) that 

H(x,p, a) = f (x, a) ■ p + r(x, a) = H(x,p, a). 

Therefore 

/y.*(f) 

p*(f) := : 

\p^(t) 

satisfies (ADJ), (M) for the Hamiltonian H. □ 
A. 5. MULTIPLE CONTROL VARIATIONS. 

To derive the Pontryagin Maximum Principle for the fixed endpoint problem in 
§A.6 we will need to introduce some more complicated control variations, discussed 
in this section. 

DEFINITION. Let us select times < si < S2 < ■ ■ ■ < sat, positive numbers 
< Ai, . . . , Ajv, and also control parameters a±, a,2, ■ ■ ■ , ajv £ A. 
We generalize our earlier definition (8.1) by now defining 

a k tfs k -\ k e<t<s k (k = 1, . . . , N) 

a(t) otherwise, 

for e > taken so small that the intervals [s k — \ k e, s k ] do not overlap. This we 
will call a multiple variation of the control oc(-). 

Let x e (-) be the corresponding response of our system: 

x e (*) = f(x e (t),a e (t)) (t>0) 
x e (0) =x°. 



(8.13) a e (t) :-- 



(8.14) 



NOTATION, (i) As before, A(-) = V x f (x(-), £*(•)) and we write 
(8.15) y(t) = Y(t, s)y s (t > s) 

to denote the solution of 

y(t) = A(£)y(£) (t > s) 
y(s) = y s , 
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(8.16) 



where y s G M n is given, 
(ii) Define 

(8.17) y Sk := f (x(s fc ), a fc )) - f (x(s fc ), a(s fc )) 
for fc = l,...,iV. 

We next generalize Lemma A. 2: 

LEMMA A. 5 (MULTIPLE CONTROL VARIATIONS). We have 

(8.18) Xg(t) = x(t) + ey(t) + o(e) as e -> 0, 
uniformly for t in compact subsets of [0, oo), where 

(y(t) = o (0<t< Sl ) 

(8.19) J y(£) = Er=i Sfc)y Sfe (s m < t < s m+1 , m = 1, . . . , N - 1) 
i y(*) = ££LiA fc Y(*, Sfc )y»* (siv<*). 



DEFINITION. The cone o/ variations at time t is the set 

iV 



(8.20) K(t) := \J2^kY(t,s k )y ak 

k=i 



N = 1,2,..., A fc > 0, afc G A, 

< Si < S2 < • • • < Sn < t 



Observe that K(t) is a convex cone in IR n , which according to Lemma A. 5 
consists of all changes in the state x(£) (up to order e) we can effect by multiple 
variations of the control a(-). 

We will study the geometry of K(t) in the next section, and for this will require 
the following topological lemma: 

LEMMA A. 6 (ZEROES OF A VECTOR FIELD). Let S denote a closed, 
bounded, convex subset ofMJ 1 and assume p is a point in the interior of S . Suppose 

$ : S -> R n 

is a continuous vector field that satisfies the strict inequalities 

(8.21) |#(x) -x\ < \x-p\ forallxedS. 
Then there exists a point x G S such that 

(8.22) #(x)=p. 
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Proof. 1. Suppose first that S is the unit ball £?(0, 1) and p = 0. Squaring 
(8.21), we deduce that 

&(x)-x>0 for all x e (95(0,1). 
Then for small t > 0, the continuous mapping 

:= x - t&(x) 

maps 5(0, 1) into itself, and hence has a fixed point x* according to Brouwer's 
Fixed Point Theorem. And then <&(x*) = 0. 

2. In the general case, we can always assume after a translation that p = 0. 
Then belongs to the interior of S. We next map S onto 5(0, 1) by radial dilation, 
and map 3? by rigid motion. This process converts the problem to the previous 
situation. □ 



A.6. FIXED ENDPOINT PROBLEM. 

In this last section we treat the fixed endpoint problem, characterized by the 
constraint 



(8.23) 



x(t) = x 1 , 



where r = r[a(-)] is the first time that x(-) hits the given target point x 1 G IR n . 
The payoff functional is 



(P) 



P[a(-)]= / r(x( S ),a(s))ds. 
Jo 



ADDING A NEW VARIABLE. As in §A.4 we define the function x n+1 
[0, r]^Iby 

x n+1 (t) = r(x(t),a(£)) (0 < t < r) 

a; n+1 (0) = 0, 
and reintroduce the notation 



X 



X 

X n +1 



\Xn+l / 



X n (t) 
V^ +1 (t)/ 



V'oV 



7,, , / f(x,a) 
t(x,a) := ; C 
\r{x, a) 



r(x,a) 
\ r(x,a) / 
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with 



g(x) = x n+1 . 



The problem is therefore to find controlled dynamics satisfying 



(ODE) 

and maximizing 
(P) 



x(t) = f(x(t),a(t)) (0<t<r) 
x(0) = x°, 



^(x(r))=^ +1 (r), 



r being the first time that x(r) = x 1 . In other words, the first n components of 
x(t) are prescribed, and we want to maximize the (n + l) th component. 

We assume that <**(•) is an optimal control for this problem, corresponding 
to the optimal trajectory x*(-); our task is to construct the corresponding costate 
p*(-)> satisfying the maximization principle (M). As usual, we drop the superscript 
* to simplify notation. 

THE CONE OF VARIATIONS. We will employ the notation and theory from 
the previous section, changed only in that we now work with n+1 variables (as we 
will be reminded by the overbar on various expressions). 

Our program for building the costate depends upon our taking multiple varia- 
tions, as in §A.5, and understanding the resulting cone of variations at time r: 



N 



(8.24) K = K(r) := \ ]T X k Y(r, s k )y Sk 



,fc=i 



N = l,2,...,A fc > 0, a k eA, 

< Sl < S2 < ■ ■ ■ < Sn < T 



for 



(8.25) y Sfc := f (x(sfc), a k ) - f (x(s fc ), a(s k )). 
We are now writing 

(8.26) y(t) = Y(t lS )y s 
for the solution of 

(8.27) 

with A(-) := V s f(x(-), «(•))• 



f(t) = Mt)y(t) (s<t<r) 

y(s) = r, 
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LEMMA A. 7 (GEOMETRY OF THE CONE OF VARIATIONS). We 

have 

(8.28) e n+1 i K°. 

Here K° denotes the interior of K and e k = (0, . . . , 1, . . . , 0) T , the 1 in the k-th 

slot. 

Proof. 1. If (8.28) were false, there would then exist n+l linearly independent 
vectors z 1 , . . . , z n+1 G K such that 

n+l 

e^ 1 = £ X k z k 
k=i 

with positive constants 

A fc > 

and 

(8.29) z k = Y{T,s k )y s * 

for appropriate times < s\ < s\ < • • • < s n+ i < t and vectors y Sk = f (x(sfc), a^)) — 
f (x(sfc), a(sjfc)), for = 1, . . . , n + 1. 

2. We will next construct a control c* e (-), having the multiple variation form 
(8.13), with corresponding response x e (-) = (x e (-) T , x™ +1 (-)) T satisfying 

(8.30) x e (r)=x 1 
and 

(8.31) < +1 (r) >x n+1 (r). 

This will be a contradiction to the optimality of the control ct(-): (8.30) says that 
the new control satisfies the endpoint constraint and (8.31) says it increases the 
payoff. 

3. Introduce for small rj > the closed and convex set 

{n+l 
x = ^2\ k z k 0<X k <V 
k=l 

Since the vectors z , . . . , z n+1 are independent, S has an interior. 
Now define for small e > the mapping 

$ e : S -> M n+1 

by setting 

<$> e (x) := x e (r) - x(r) 
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for x = Ylk=i ^kZ h , where x e (-) solves (8.14) for the control a e (-) defined by (8.13). 

We assert that if n, rj, e > are small enough, then 

&( x ) = p: = ^ e n+1 = (0, . . . , 0, /i) T 

for some x G S. To see this, note that 

|$ e (x) — x\ = |x e (r) — x(r) — x\ = o(\x\) as x — > 0, x E S 

< \x — p\ for all a; G dS 1 . 

Now apply Lemma A. 6. □ 



EXISTENCE OF THE COSTATE. We now restore the superscripts * and so 
write x*(-) for x(-), etc. 

THEOREM A. 8 (PONTRYAGIN MAXIMUM PRINCIPLE). Assuming 
our problem is not abnormal, there exists a function p* : [0, r*] — > M n satisfying the 
adjoint dynamics (AD J) and the maximization principle (M). 

The proof explains what "abnormal" means in this context. 

Proof. 1. Since e n+ i ^ K° according to Lemma A. 7, there is a nonzero vector 
w G R n+1 such that 

(8.32) w ■ z < for all z G K 
and 

(8.33) w n+1 > 0. 

Let p*(-) solve (ADJ), with the terminal condition 

p*(r) = w. 

Then 

(8.34) p n+1 '*(-) = w n+1 > 0. 
Fix any time < s < r, any control value a G A, and set 

y s :=f(x*( a ),a)-f(x*( a ),a*( a )). 

Now solve 

f y(t) = A(t)y(t) (s < t < r) 

I y(s) = y s ; 

so that, as in §A.3, 

> w ■ y(r) = p*(r) • y(r) = p*(s) • y(s) = p*(s) ■ y s . 
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Therefore 

p*(a)-[f(5t*(a),a)-f(**(a),**(a))}<0; 

and then 

H(5t*(s),P*(s),a) = f(5t*( S ),a).p*(s) 

(8.35) 

< f(x*( S ), a* (a)) ■ p*(s) = #(x*( S ), p*(s), a* (a)), 

for the Hamiltonian 

H(x, p, a) = f(x, a) ■ p. 
2. We now must address two situations, according to whether 

(8.36) w n+1 > 
or 

(8.37) w n+1 = 0. 

When (8.36) holds, we can divide p*(-) by the absolute value of w n+1 and recall 
(8.34) to reduce to the case that 

p n+1 >*(-) = 1. 

Then, as in §A.4, the maximization formula (8.35) implies 

#(x*( S ), p*( S ), a) < H(x*(a), p*(a), cx*(a)) 

for 

H(x,p, a) = f (x, a) • p + r(x, a). 
This is the maximization principle (M), as required. 

When (8.37) holds, we have an abnormal problem, as discussed in the Remarks 
and Warning after Theorem 4.4. Those comments explain how to reformulate the 
Pontryagin Maximum Principle for abnormal problems. □ 



CRITIQUE, (i) The foregoing proofs are not complete, in that we have silently 
passed over certain measurability concerns and also ignored in (8.29) the possibility 
that some of the times Sk are equal. 

(ii) We have also not (yet) proved that 

t i— > H (x* (t) , p* (t) , a* (t) ) is constant 

in §A.3 and A.4, and 

ff(x*(f),p*(f),a*(f)) = 
in 8A.5. □ 
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A.7. REFERENCES. 

We mostly followed Fleming-Rishel [F-R] for §A.2-§A.4 and Macki-Strauss [M- 
S] for §A.5 and §A.6. Another approach is discussed in Craven [Cr]. Hocking [H] 
has a nice heuristic discussion. 
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